[Linux] Big Brother

---------

New Message Reply About this list Date view Thread view Subject view Author view

From: Davut Topcan (topcan@karadeniz.org)
Date: Mon 09 Sep 2002 - 10:15:56 EEST


Content-Type: text/plain;
        charset="iso-8859-9"
Content-Transfer-Encoding: 8bit

Message

 See You Later
(:------------:)
  Davut Topcan
Tel : 212 295 4737
Fax : 212 295 4743
Karadeniz Holding A.Ş.

----------------------------------------------------------------------------

----

Big Brother A Web-based Unix Network Monitoring and Notification System This document introduces Big Brother, a solution to the problem of Unix Systems Monitoring. Commercial solutions are available, however these solutions are very expensive, costing many thousands of dollars in additional hardware and software. More distressing though, is the obvious complexity of these solutions, most requiring several hundred man-hours of consulting time just to set up. Finally, they don't use the Web as their interface, not only making truly remote monitoring impossible, but also making data sharing extremely difficult.

Big Brother is a simple, effective solution to the Systems Monitoring problem, and is presented here for your comments and suggestions.

---------------------------------------------------------------------------- --

What is Big Brother? Big Brother is a loosely-coupled distributed set of tools for monitoring and displaying the current status of an entire Unix network and notifying the admin should need be. It came about as the result of automating the day to day tasks encountered while actively administering Unix systems.

It consists of five major parts:

a.. Central monitoring station (Display Server) This station accepts incoming reports and prepares them for display. Big Brother uses the Web as its user interface, so it can be accessed by anyone with clearance to access the Big Brother site. Furthermore, additional views or displays can be created quickly and easily by writing simple Bourne Shell scripts.

b.. Network monitor (bb-network.sh). The network monitor runs on any Unix machine and periodically contacts every machine, router, and firewall in the your bb-hosts file via ping, and http if appropriate. Results are then sent to the system designated as the Display Server. Connections to the internet may likewise be monitored.

c.. Local System Monitor (bb-local.sh). Each server in the network runs a local system monitor which periodically samples disk space, CPU usage, number of users, and ensures that required processes are on-line and active. It reports the results to the Display Server, and also has the ability to directly page the administrator in the case of an emergency.

d.. Pager Programs (bb-page.sh). The client (bb) that sends single lines of information to the designated server (bbd) and executes a script (page) which forwards this information using Kermit via modem to the designated pager.

e.. Intra-machine communications programs (bb, bbd, nettest). A client (tac-bb) that sends single lines of information to the designated Display and Pager Serverswhich then take appropriate action. These programs communicate using port 1984; what else would Big Brother use?

---------------------------------------------------------------------------- ----

---------------------------------------------------------------------------- ----

The Big Brother Display: Big Brother was designed to provide instant information about the health of a Unix Network to anyone, anywhere, with Web access to the site.

Network information is now instantly available to those who need it most: managers, systems administrators, and people on the help desk can actively and simply monitor the health of the network.

If any condition is severe, the administrator will have been paged, can use Big Brother to get additional information, and can proceed to fix the problem. Problem verification, data sharing and correction should improve immediately, since everyone has implicit access to the same information.

Finally, since warnings are displayed, corrective action can be taken even before users notice that there is a problem.

The display matrix shows a status of green (ok), yellow (warning), red (severe), and blue (no contact) for each system/area combination. Furthermore, the entire screen changes color to reflect the most serious condition on the network. In order of increasing severity these conditions are: green, yellow, blue, red.

Therefore one single warning anywhere on the network results in the entire display turning yellow which is highly visible, even from far away.

Each of the elements in the display matrix can then be clicked on to provide additional information, including the code, time, and specific information about the area being monitored.

Additionally, Big Brother now makes detailed information about every server in the network instantly accessible, just by clicking on the server name.

---------------------------------------------------------------------------- --

Big Brother Warning Conditions: Every machine, firewall and router is accessed via ping every 5 minutes. Any loss of contact results in a code red, and the administrator being paged.

Every registered Web server is accessed every 15 minutes using a bbnet. Loss of contact with a Web server is a severe condition. Inability of access a page due to a "Server Error" results in a warning condition.

All systems are monitored for disk usage. Any disk over 90% full is considered a warning condition. Disks over 95% full are marked as a severe condition, since this situation can quickly result in a system crash or hang.

All systems are monitored for CPU usage. A load average over 1.50 is a warning condition, 3.00 merits a severe condition.

Processes are monitored on each system as well. The choice of what is to be monitored is dependent of what each system actually does. A warning condition results if any of these important processes should die.

System messages are monitored. Big Brother watches /var/adm/messages for NOTICE and WARNING conditions. NOTICE conditions result in the admin being paged immediately. WARNING conditions cause a yellow dot to appear. Clicking on the corresponding dot will report the message that caused the display.

And finally the messaging system itself is monitored by the Central Monitoring station. Any report over 30 minutes old results in that report, and the entire screen being marked in blue, indicating a possible loss of contact within the Big Brother system itself.

Note that all of the above are configurable parameters.

---------------------------------------------------------------------------- --

Design Considerations: Some of the guidelines involved in the design of Big Brother are the following:

a.. Usefulness. Display only useful information in a timely and effective manner. Too much information can be as bad as too little, especially if it's displayed in an obscure manner.

b.. Intelligence. Use of heuristics to determine health, in a manner similar to what an actual administrator or user would do, instead of very low-level testing.

c.. Highly scalable. Easily made more redundant through replication of the individual components. All that has to be decided on is the level of redundancy required and where to install the scripts. No additional cost and little additional complexity.

d.. Modular. Big Brother is designed in a totally modular fashion, using simple Bourne shell scripts to test each area. Adding more monitoring areas only requires the creation of a script to monitor that area. New monitoring areas can be added and distributed in a manner of minutes.

e.. Easily Customized. Error levels and actions can be designated on a system-by-system basis, just by editing the appropriate script.

f.. Replication. Separation of local system reporting and network reporting means that there is built-in replication, at the system level and the network level. Further redundancy is built-in at the display level through checking timestamps of incoming reports; loss of reports indicates a problem which causes the screen to turn blue. Finally, each client has the ability to send messages to the pager daemon; this further reduces the likelihood of non-notification in the event of catastrophic systems failure.

g.. Simplicity. The design is extremely simple, the client/server programs are less than 8K bytes in size, and messages are less than 100 bytes. It is extremely lightweight and very efficient.

h.. Open and Standard. Using the Web as the display mechanism allows extremely rapid prototyping of displays, and provides a standard and elegant method of distributing this essential information, worldwide. The C programs involved are small, simple and standard. Shell scripts are obviously portable, as are programs like lynx and kermit, which are used for testing. Therefore, all the elements of Big Brother can be easily ported to any Unix System.

---------------------------------------------------------------------------- --

What Big Brother isn't: Big Brother is not a replacement for a qualified and experienced Systems Administrator. On the contrary, it is a big brother to the Sys Admin. It does not shut down machines or terminate processes, although it could be programmed to do so. It just identifies and notifies.

Big Brother does not explicitly monitor individual hardware components. However, failure of a hardware component is very likely to cause a severe condition through loss of service.

Big Brother does not monitor performance of the network, servers, databases or any individual application. It will however provide information about CPU loads and implicit information about response time; i.e. telnet connections have 15 seconds to answer.

Big Brother isn't complicated. Once the methodology and underlying tools are understood, changes and enhancements are very simple to make.

Big Brother isn't expensive. In fact, it's free.

Big Brother isn't finished. But it keeps growing every day, doing more and more watching. Big Brother is watching...

---------------------------------------------------------------------------- --

Other things Big Brother Should Do: Errors in /var/adm/messages should be handled better.

Big Brother should support alphanumeric paging and enhanced messages.

Big Brother should be enhanced to work on black and white screens.

Big Brother should log critical and warning situations.

Big Brother should learn something about Oracle databases.

Big Brother should automatically determine what processes to monitor.

Big Brother should probably try to monitor security.

Your Comments and Suggestions are needed!

Send them via e-mail to sean@iti.qc.ca

---------------------------------------------------------------------------- ----

About the Creator: Big Brother was created by Sean MacGuire, while a consultant with the Tactik Infrastructure Group of Bell Sygma in Montreal. After fifteen years of Unix Systems Administration, Systems Design, and User Interface Design, this system was created to make his life easier.

Sean MacGuire currently has two patents pending, and is the publisher of It's a Bunny, a literary Web Magazine which I-Way magazine ranked as one of the top 500 sites on the the Internet, and the 10th best magazine.

The photo which adorns the Big Brother Display is Sean MacGuire. He is Big Brother... it is meant to be reminicent of George Orwell's book 1984. That's why it's not a pretty picture.

-- Binary/unsupported file stripped by Ecartis -- -- Type: image/gif -- File: bb.gif

-- Binary/unsupported file stripped by Ecartis -- -- Type: image/gif -- File: bb-diag.gif

----------------------------------------------------------------------- Liste üyeliğiniz ile ilgili her türlü işlem için http://liste.linux.org.tr adresindeki web arayüzünü kullanabilirsiniz.

Listeden çıkmak için: 'linux-request@linux.org.tr' adresine, "Konu" kısmında "unsubscribe" yazan bir e-posta gönderiniz. -----------------------------------------------------------------------


New Message Reply About this list Date view Thread view Subject view Author view

---------

Bu arsiv hypermail 2b29 tarafindan uretilmistir.