One service hangs every day, argusctl -hup fixes it
yary
not.com at gmail.com
Mon Jun 16 16:29:47 EDT 2008
This is a continuation of the issue posted here-
http://www.tcp4me.com/pipermail/arguslist/2008-June/000888.html
I've noticed that some of my argus services hang- the tests don't run.
If there's a failure during the "hang" then argus won't catch it.
Running argusctl -hup restarts all the services. Now I have cron
running that every day at 16:10.
One of my "prog" services seems to hang reliably. Here's its daily
graph, with interruptions clearly visible-
http://i186.photobucket.com/albums/x312/fecundfec/mc/temp_2008_06_16.png
The sample-level graph, showing its last sample around 11pm, then
restarting at 16:10
http://i186.photobucket.com/albums/x312/fecundfec/mc/temp_sample_2008_06_16.png
For comparison, here's another "prog" service that does not hang, same
time period, daily graph.
http://i186.photobucket.com/albums/x312/fecundfec/mc/traf_2008_06_16.png
So argus is still running and the computer is still running, it's just
one service that hangs. If I don't run "argusctl hup" then others
eventually hang as well, even "Self/Services" has gaps in its graph.
Here's the definition of the working prog service:
Group "Network Traffic" {
graph: yes
frequency: 7min
Group "Pinky" {
service: Prog {
uname: Rec
label: Rec
command: netstat -ssp ip
pluck: (\d+) total packets received
expect: \d+
calc: rate
}
service: Prog {
uname: Send
label: Send
command: netstat -ssp ip
pluck: (\d+) packets sent from this host
expect: \d+
calc: rate
}
}
... (another group)...
}
and the failing one:
Group "Hardware" {
graph: yes
Service Prog {
command: sysctl -n hw.sensors.25
uname: Temperature
label: Temperature
title: Pinky CPU Temperature
ylabel: degrees F
pluck: Temp2[^/]*/ (\d*\.\d*) degF
}
}
The only difference of note is that the failing/hanging group uses the
default frequency, whereas the one that's not hanging has a frequency
defined... on the other hand, before putting in the "argusctl hup"
services with "frequency" set would hang as well- there's no
dependency or cron- here's some debug info from the hanging service-
bios::addtfs 336
bios::inits 335
bios::reads 670
bios::settos 335
bios::shuts 335
bios::timefs 335
bios::timeouts 50
cfdepth 2
opentime Sun 15 Jun 22:33:31 2008
overridable 1
ovstatus up
ovstatussummary unprintable data structure
ovstatussummary::severity clear
ovstatussummary::total 1
ovstatussummary::up 1
prog::command sysctl -n hw.sensors.25
prog::exit 0
prog::pid 0
prog::rbuffer lm0, Temp2, temp, 46.50 degC / 115.70 degF~x0A
severity critical
siren 1
sirentime Tue 20 May 16:30:14 2008
slaves_keep_state *
slaves_send_notifies *
sort 1
srvc unprintable data structure
srvc::dones 335
srvc::elapsed 59.7983829975128
srvc::finished 1
srvc::frequency 60
srvc::lasttesttime Sun 15 Jun 22:33:31 2008
srvc::nexttesttime Sun 15 Jun 22:35:31 2008
srvc::phi 31
srvc::result 115.70
srvc::retries 2
srvc::showreason 0
srvc::starts 335
srvc::state done
srvc::status up
srvc::timeout 60
srvc::tries 0
stats::lasttime Mon 16 Jun 13:00:00 2008
stats::log unprintable data structure
stats::monthly unprintable data structure
stats::status up
stats::yearly unprintable data structure
status up
test unprintable data structure
test::alpha 1
test::currvalue 115.70
test::pluck Temp2[^/]*/ (\d*\.\d*) degF
test::rawvalue lm0, Temp2, temp, 46.50 degC / 115.70 degF~x0A
test::spike_supress no
test::testedp 1
transtime Tue 20 May 16:30:14 2008
type Service
What should I be looking at? How to debug?
thanks
-y
More information about the Arguslist
mailing list