Why would a service stop running?

yary not.com at gmail.com
Fri May 30 20:44:52 EDT 2008


I have one group ("DSL Route" below), with a ping service that stopped
testing, and I'm not sure why. This group is "gravity up", and it's
not alerting when both routers appear to be down, because it
apparently is only testing one of them. There's no "depends":

Group "World Reachability" {
        retries: 0
        countstop:	yes
        frequency:      2min
	sendnotify:	yes
	graph: yes
	# do not send a notification if only some are down
	# only if they are all down
        gravity:        up

        Group "Local Servers" {
	sendnotify:	no
        gravity:        up
		service: Ping {
			label:	Stanford
			hostname:	stanford.edu
		}
		service: Ping {
			label:	PaloAlto
			hostname:	gatekeeper.city.palo-alto.ca.us
		}
        }

	Group "DSL Route" {
	sendnotify:	no
        gravity:        up
		Service Ping {
			label: Speakeasy Gateway
			hostname: 216.27.178.1
		}
		Service Ping {
			label: Speakeasy Upstream
			hostname: 69.17.83.177
		}
	}

}

Here's graphs- the first shows the two services combined, with one of
them stopping testing this morning

http://i186.photobucket.com/albums/x312/fecundfec/mc/missing_test.png

It's not easy to see on the combined graph which one is running, when
it might be obscured by the other graph, so here they are separated:

gateway ping, last datapoint is around 10am, though it is now 5:30pm
and argus is still running-
http://i186.photobucket.com/albums/x312/fecundfec/mc/ping_gateway.png

upstream ping, last datapoint current:
http://i186.photobucket.com/albums/x312/fecundfec/mc/ping_upstream.png

gateway ping, showing gaps earlier in history:
http://i186.photobucket.com/albums/x312/fecundfec/mc/ping_gateway_hours.png

upstream ping, without gaps:
http://i186.photobucket.com/albums/x312/fecundfec/mc/ping_upstream_hours.png

What could make the gateway ping service have those gaps while argus
is still running & the other ping service doesn't have gaps? How can I
ensure that it always runs or fails?

Debug page at end of message. thanks in advance
-y


Debugging Dump of Top:World_Reachability:DSL_Route:Ping_216.27.178.1
acl_about	root view_all
acl_annotate	root staff
acl_checknow	root view_all
acl_flush	root view_all
acl_getconf	root view_all
acl_logfile	root staff
acl_mode	simple
acl_ntfyack	root staff
acl_ntfyackall	root view_all
acl_ntfydetail	root staff
acl_ntfylist	root staff
acl_override	root staff
acl_page	root staff user view_all
aclcache	unprintable data structure
alarm	0
autogenerated	0
bios	unprintable data structure
bios::addtfs	752
bios::inits	751
bios::reads	1502
bios::settos	751
bios::shuts	751
bios::timefs	751
cfdepth	3
children	unprintable data structure
confck	unprintable data structure
config	unprintable data structure
countstop	0
currseverity	clear
darp	unprintable data structure
definedattime	Wed 28 May 09:33:01 2008
definedinfile	/var/argus/config
definedonline	191
depend	unprintable data structure
graph	1
graphd	unprintable data structure
graphd::gr_nmax_days	1024
graphd::gr_nmax_hours	1024
graphd::gr_nmax_samples	2048
gravity	down
image	unprintable data structure
image::barstyle	minmax
image::drawborder	1
image::gr_line_thickness	1
image::gr_show_days	1
image::gr_show_hours	1
image::gr_show_samples	1
image::gr_what	result
image::labelstyle	box
image::title	Speakeasy Gateway
image::transparent	1
label	Speakeasy Gateway
label_left	Speakeasy Gateway
label_right	Speakeasy Gateway
logsize	200
name	Ping
nostats	0
nostatus	0
notify	unprintable data structure
notify::ackonup	0
notify::autoack	1
notify::list	unprintable data structure
notify::mail_from	Argus
notify::message_fmt	%i %m - %t
notify::messagedn	Top:World_Reachability:DSL_Route:Ping_216.27.178.1 is DOWN
notify::messageup	Top:World_Reachability:DSL_Route:Ping_216.27.178.1 is UP
notify::nolotsmsgs	0
notify::notify	mail:n.otcomm at gmail.com
notify::renotify	300
notify::sendnotify	0
notify::shortmessages	0
opentime	Thu 29 May 10:33:36 2008
overridable	1
ovstatus	up
ovstatussummary	unprintable data structure
ovstatussummary::severity	clear
ovstatussummary::total	1
ovstatussummary::up	1
parents	unprintable data structure
passive	0
ping	unprintable data structure
ping::addr	~xD8~x1B~xB2~x01
ping::data	216.27.178.1 is alive (150 ms)
ping::hostname	216.27.178.1
ping::ipver	4
ping::pid	32465
ping::rbuffer	
ping::rtt	150
prevovstatus	down
prevstatus	down
severity	critical
siren	1
sirentime	Thu 29 May 04:51:36 2008
slaves_keep_state	*
slaves_send_notifies	*
sort	1
srvc	unprintable data structure
srvc::dones	751
srvc::elapsed	0.190173149108887
srvc::finished	1
srvc::frequency	120
srvc::lasttesttime	Thu 29 May 10:33:36 2008
srvc::nexttesttime	Thu 29 May 10:35:36 2008
srvc::phi	96
srvc::result	150
srvc::retries	0
srvc::showreason	0
srvc::starts	751
srvc::state	done
srvc::status	up
srvc::timeout	60
srvc::tries	0
stats	unprintable data structure
stats::daily	unprintable data structure
stats::lasttime	Fri 30 May 17:00:00 2008
stats::log	unprintable data structure
stats::monthly	unprintable data structure
stats::status	up
stats::yearly	unprintable data structure
status	up
test	unprintable data structure
test::alpha	1
test::spike_supress	1
timeout	Thu 29 May 10:34:41 2008
transtime	Thu 29 May 05:03:37 2008
type	Service
uname	Ping_216.27.178.1
unique	Top:World_Reachability:DSL_Route:Ping_216.27.178.1
vxml_long_name	Top:World_Reachability:DSL_Route:Ping_216.27.178.1
vxml_short_name	Ping_216.27.178.1
wantread	0
wantwrit	0
web	unprintable data structure
web::bkgimage	/img/argus.logo.gif
web::bldtime	Fri 30 May 17:06:28 2008
web::cachestale	120
web::footer_argus	<P><FONT SIZE="-1"><A
HREF="http://argus.tcp4me.com">Argus</A>: 3.5</FONT></P>
web::icon	/img/smile.gif
web::icon_down	/img/sad.gif
web::javascript	/argus.js
web::nospkr_icon	/img/nospkr.gif
web::refresh	60
web::shownotiflist	1
web::showstats	1
web::sirensong	/sound/whoopwhoop.wav
web::style_sheet	/argus.css
web::transtime	Fri 30 May 17:00:00 2008


More information about the Arguslist mailing list