Test for 'hung' services(was Re: Upgrade works with one issue)
Mark Duling
mark.duling at biola.edu
Wed Jun 25 17:22:20 EDT 2008
After setting up argus I tested it by setting it to monitor a service that was down. I found that it would alert for 1.5 days and stop alerting me until it was restarted. Multiple tests had the same result. So I never put it in production.
On 6/24/08 2:43 PM, "yary" <not.com at gmail.com> wrote:
> ... I have used Argus for 3 years now, and never touched it. It just works,
> and my perceived uptime has been 100%.
The keyword is "perceived." I installed Argus 3.5 about a month and a half
ago, and discovered it wasn't warning me about some network downtimes.
Turning on graphing showed some gaps in the services. I found that when the
graph showed a gap, the service wasn't running- I could bring a network
interface down, leave it down, and argus wouldn't complain. This is with a
simple ping test, no dependencies, retries: 0. When I restarted
argus, it would resume testing and start alerting.
So I set up "argusctl hup" in a cron job to restart argus, which restarts
all the frozen test services. Important services now test without gaps.
More information about the Arguslist
mailing list