FIDO: process watchdog
On the campus FIDO instance, multiple times it has been observed that the fido_snmp non blocking call to Net::SNMP::snmp_dispatcher() doesn't return. In 2013, I introduced the non-standard module AnyEvent::SNMP which replaces Net::SNMP's event loop. Many months went by but the problem recurred.
On the campus FIDO instance, in the root users' crontab, you will find
*/5 * * * * /usr/local/fido/bin/fido_watchdog.pl >> /home/net/fido/logs/fido_watchdog.log 2>&1
The fido_watchdog opens the latest FIDO status file and looks for tests that have gone STALE that and have restart instructions. If conditions have been met, the watchdog attempts to restart the stalled processes.
As of 2017/08, restart instructions are in place for several tests