Test for 'hung' services(was Re: Upgrade works with one issue)

Mark Duling mark.duling at biola.edu
Wed Jun 25 17:22:20 EDT 2008


After setting up argus I tested it by setting it to monitor a service that was down.  I found that it would alert for 1.5 days and stop alerting me until it was restarted.  Multiple tests had the same result.  So I never put it in production.


On 6/24/08 2:43 PM, "yary" <not.com at gmail.com> wrote:

> ... I have used Argus for 3 years now, and never touched it. It just works,
> and my perceived uptime has been 100%.


The keyword is "perceived." I installed Argus 3.5 about a month and a half
ago, and discovered it wasn't warning me about some network downtimes.
Turning on graphing showed some gaps in the services. I found that when the
graph showed a gap, the service wasn't running- I could bring a network
interface down, leave it down, and argus wouldn't complain. This is with a
simple ping test, no dependencies,         retries: 0. When I restarted
argus, it would resume testing and start alerting.

So I set up "argusctl hup" in a cron job to restart argus, which restarts
all the frozen test services. Important services now test without gaps.


More information about the Arguslist mailing list