Clumon Performance Monitor

Preface & Origins
The Clumon performance monitoring system was developed for monitoring Linux-based clusters at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Champaign-Urbana. The system is currently based on Performance Co-Pilot by SGI and the PBS scheduler. It also uses MySQL as its database, and Apache with PHP as its bundled viewer. The system was created to give an overview of what the current state of the cluster was at a glance. This of which requires information from both the scheduler and the hosts themselves. Until now there was no tool that combined and cross-referenced data from the two areas. The system was also need to provide for a common structure and single point of access so that the many users and administrators would not overload the cluster itself with redundant requests for the same information. The primary requirements for the system were as follow: scalability, low bandwidth, use freely-available components, versatility, controllability. Scalability was first and foremost. The system had to be able to potentially thousands of hosts and, at the same time, not create too large a load on the hosts from which data was being collected or the network layer. This also led to the requirement of controllability. Since there are many, many different potential configurations for clusters, the system needed to be flexible enough to be tuned to work on a broad range of clusters. Monitoring performance data is a trade off between detail, scale & frequency versus load & bandwidth. Therefore, the performance monitor's administrator should be able to adjust the frequency and amount of data collected since the scale and bandwidth of a cluster is commonly a fixed value. This should allow an administrator to balance the amount of resources used for performance monitoring versus those required for computation. Versatility is a requirement in any system such as this one, because it is to be used by different categories of users. These generally include the users that are doing computation on the cluster, the system administrators keeping an eye on things, and developers and engineers doing testing and debugging; just to name a few. And all of these groups tend to care about different aspects of the cluster, with none of the group knowing what they want from the beginning. So the administrator must be able to add and, of course, remove items from the set of collected data in a very simple manner. One of the final requirements was that it be developed with freely-available, standardly used software packages so that others may easily develop components to interact with this system.

Architecture
Before the structure can be explained, a look at the components is necessary. The central component is the clumond data collection daemon.

Benefits of Architecture
There are numerous benefits that the architecture itself grants an administrator, the users, as well as overall performance of the cluster. From a cluster performance and integrity standpoint, this system acts as a gateway for performance data. This allows the administrator to throttle the bandwidth and cycles used by the cluster for monitoring. It also allows others to develop their own applications that would use the same data and provides them with a standard set of tools for accessing that data. That of course being db connectivity and custom formatting via PHP. From an integrity standpoint, there are is only a single entity collecting data on the cluster and routing and filtering can be set up such this traffic is allowed and other types are explicitly forbidden. In essence it provides a known quantity such that the any other traffic can be disregarded. This is a very positive thing to be able to do in the realm of security. And, at the same time, virtually any performance metric can be provided to a user.
Administrative and uptime issues are addressed by having all configuration contained in the database and having only a passive set of software on each of the hosts being monitored. Changes take effect without having to communicate with the hosts and even without having to restart any service on the monitoring machine.

Limitations of Architecture
There are a few limitations of this system, most of which are tied to either hardware or the underlying software. The number of parallel collections will be limited by the memory, and to a lesser extent, the processor speed of the machine on which the clumond executes. Performance may be limited by the database handling it's various connections. The db may need to be configured to allow more concurrent connections to facilitate a large number of parallel collection processes. Clumond was designed to run on the same machine as the database due to the high volume of communications between it and the db. The same is true for the PHP web interface. The web side of things can easily be moved to another web server, but performance will degrade with larger sets of machines displayed.

Algorithm
In this section, the algorithm that the clumond executable goes through on each iteration will be described in general detail. Great detail can be gathered from the documentation within the source code itself. This section is for a general understanding of the process and an aid to those digesting the source.

Parameter Gathering Phase
The first thing that takes place is retrieving configuration data from the database. This is why changes can easily be made without restarting the service. Data such as the update interval and scheduler hostname, along with data on whether to collect scheduler data or not is what is generally retrieved during this phase.
Initial Fork
The clumond process forks in two at this point. One process is the data gathering process, while the other is the surviving process that does any potential force cutting of the process data if the timeout is exceeded, as well as db cleanup, etc. (see X phase)
Scheduler Data Collection Phase
If the database indicates that a scheduler information should be gathered, the provided host is queried. Information flown in by asking for the list of queues and their attributes, then the list of jobs within each of those queues and those jobs attributes.
Pre Host Data Collection Phase
At this point, clumond will add the defined list of hosts to those that are to be queried if the global variable GetDefHosts is set to 1. If GetSchedHosts is set to 1 (i.e. same variable used to determine whether to collect scheduler info), the hosts listed in the scheduler will have already been added to that list during the Scheduler Data Collection Phase. During this phase, clumond also retrieves a list of metrics from the database that it will use to query the hosts. After the list of hosts from which to collect data is assembled, clumond determines the number of hosts each process will need to collect and forks the appropriate number of processes for actual data collection.
Host Data Collection Phase
Done in parallel, the collection processes gather the metrics from the assigned range of machines in the list of machines. When this is completed, these processes clean up and die.
Post Host Data Collection Phase
The surviving process that forked the initial gathering process either monitors that all the hosts have had their data harvested or has reached the timeout condition specified by the MaxHostTries global variable. At either condition, the surviving process locks the tables; flushes the data; moves the data from temporary tables in which the data is deposited during the collection, to the regular tables; and then flushes the temporary tables.