DoIT Monitoring Services (Overview)
Overview of the monitoring capabilities DoIT provides on the servers for which it has operational responsibility.
Event Management and Monitoring
DoIT monitors the servers for which it has operational responsibility. This monitoring is done through several applications that send alarms and notifications that can be monitored 24x7 by SNCC systems operators. The systems operator has available relevant contact information for system administrators and technicians; he/she is also in contact with the DoIT Help Desk. When alarms and notifications are observed, many different actions may be taken by the systems operator depending on the severity of the alert, customers affected, instructions given with the alert, etc. Alarms and actions taken are all customizable for every application or server. The alarms generally breakdown into two separate categories:
To monitor remotely accessed servers, several applications are used. These applications use various configurable probes to check functionality of services by attempting to access them remotely. These probes can be configured to check if something is available or to look for a specific response.
Monitoring is also done with applications running locally on the server DoIT is monitoring. These applications are able to check for several things:
- Hardware monitoring (on/off, CPU usage, drive capacity, etc.)
- Process monitoring (checks if certain applications are running)
- Logfile monitoring (scan specified files, like syslog, for certain conditions)
- SSL certificate expiration monitoring (checks expiration date)
In addition to monitoring, the application is able to schedule processes to be done on the server and respond to alarms (e.g. restart an application that has ended).