argus "hangs"?

yary not.com at gmail.com
Mon Jun 2 13:06:52 EDT 2008


I noticed that a machine lost connectivity this weekend, but argus
didn't complain. As I looked through the graphs on the argus services,
I noticed they all had gaps at the same time- here's one

http://i186.photobucket.com/albums/x312/fecundfec/mc/argus_percent_idle.png

No argus idle times graphed in the above, starting around 1-2am May 30
until around 10pm May 31

My first thought was that Argus wasn't running during that time-
though I did remember editing the config file- so I checked
/var/argus/log

[2008/05/28 9:33:01] [19696] config file '/var/argus/config' changed -
restarting
[2008/05/28 9:33:01] [19696] child caught signal SIGHUP - exiting
[2008/05/28 9:33:03] [6622] successful restart - Argus running
[2008/05/31 20:39:00] [6622] restart requested - HUPing
[2008/05/31 20:39:00] [6622] child caught signal SIGHUP - exiting
[2008/05/31 20:39:04] [29651] successful restart - Argus running

So on Wednesday morning argus process 19696 noticed I changed the
config file and restarted itself as process 6622. That instance of
argus ran, monitoring services for over a day, and then failing to run
any services until I ran "argusctl hup" on 5/31 Saturday night.

My next thought was that perhaps something happened at 1-2am May 30
that froze the system, but then not all services stopped testing at
that time. This is a command service that gets motherboard
temperature, it stopped running around 2am on Thurs 5/29

http://i186.photobucket.com/albums/x312/fecundfec/mc/argus_cmd_graph.png

defined thus:
Group "Hardware" {
    graph: yes
    Service Prog {
        command: sysctl -n hw.sensors
        uname: Temperature
        label: Temperature
	title: Pinky CPU Temperature
	ylabel: degrees F
	pluck: Temp2[^/]*/ (\d*\.\d*) degF
   }
}
(I intend to add voltage services to that group sometime)

On Friday evening I posted a question about some Ping services not
running. Now it seems that on my system, after a day, services just
stop running altogether. The argus service was still running, at least
it responded to hup and wrote as much in its logfile. What could be
causing this? Has anyone else seen "mystery" gaps in their graphs, or
is it just my installation? This is with argus 3.5 off the website, no
modifcations- I see there's a newer beta but I haven't downloaded it.

I just now added "spike_supress: no" to my config file, in case there
was a problem with transients, doubt that's it but it's all I can
think of at the moment.


More information about the Arguslist mailing list