Test for 'hung' services(was Re: Upgrade works with one issue)
    Mark Duling 
    mark.duling at biola.edu
       
    Wed Jun 25 17:22:20 EDT 2008
    
    
  
After setting up argus I tested it by setting it to monitor a service that was down.  I found that it would alert for 1.5 days and stop alerting me until it was restarted.  Multiple tests had the same result.  So I never put it in production.
On 6/24/08 2:43 PM, "yary" <not.com at gmail.com> wrote:
> ... I have used Argus for 3 years now, and never touched it. It just works,
> and my perceived uptime has been 100%.
The keyword is "perceived." I installed Argus 3.5 about a month and a half
ago, and discovered it wasn't warning me about some network downtimes.
Turning on graphing showed some gaps in the services. I found that when the
graph showed a gap, the service wasn't running- I could bring a network
interface down, leave it down, and argus wouldn't complain. This is with a
simple ping test, no dependencies,         retries: 0. When I restarted
argus, it would resume testing and start alerting.
So I set up "argusctl hup" in a cron job to restart argus, which restarts
all the frozen test services. Important services now test without gaps.
    
    
More information about the Arguslist
mailing list