[nut-Patches][303751] Checking UPS Temperature

nut-patches at alioth.debian.org nut-patches at alioth.debian.org
Thu Jan 4 16:35:22 CET 2007


Patches item #303751, was opened at 2006-08-12 00:04
>Status: Closed
Priority: 3
Submitted By: Eric Wilde (ewilde-guest)
Assigned to: Nobody (None)
Summary: Checking UPS Temperature 
>Resolution: Rejected
Group: None
Category: None


Initial Comment:
Last week, one of my UPS burned the batteries up (plates buckled, cases bulging, several of the sealed vent caps opened, plastic welded together).  The batteries eventually appear to have shorted and the UPS shut down, without warning, despite being on line power (lucky the equipment it was powering had a sense of humor).  From reading the log file posthumously, I see that the internal temps in the UPS reached 81 degrees Celsius, which is pretty hot.

Normal operating temperatures for this UPS are in the 40-50 degree range.  It went up into the 75-80 degree range 36 hours before the batteries shorted out so it appears that increased temperature is an excellent predictor of battery failure.

This being the case, I added the following code to upsmon to monitor temperature (changes based on nut-2.0.0).

                                 Eric Wilde

 
--- upsmon.h.orig	2004-03-08 07:09:28.000000000 -0500
+++ upsmon.h	2006-08-11 13:38:03.000000000 -0400
@@ -29,4 +29,5 @@
 /* was ST_FIRST 0x080 */
 #define ST_CONNECTED	0x100	/* upscli_connect returned OK		*/
+#define ST_OVERTEMP	0x200	/* UPS is running overtemp		*/ //EW
 
 /* required contents of flag file */
@@ -72,4 +73,5 @@
 #define NOTIFY_NOCOMM	8	/* UPS hasn't been contacted in awhile	*/
 #define NOTIFY_NOPARENT	9	/* privileged parent process died	*/
+#define NOTIFY_OVERTEMP	10	/* UPS went to overtemp			*/ //EW
 
 /* notify flag values */
@@ -101,4 +103,5 @@
 	{ NOTIFY_NOCOMM,   "NOCOMM",   NULL, "UPS %s is unavailable", 0 },
 	{ NOTIFY_NOPARENT, "NOPARENT", NULL, "upsmon parent process died - shutdown impossible", 0 },
+	{ NOTIFY_OVERTEMP, "OVERTEMP", NULL, "UPS %s is running at an excessive temperature", 0 }, //EW
 	{ 0, NULL, NULL, NULL, 0 }
 };


--- upsmon.c.orig	2004-01-31 16:00:02.000000000 -0500
+++ upsmon.c	2006-08-11 16:11:15.000000000 -0400
@@ -50,4 +50,7 @@
 static	int	rbwarntime = 43200;
 
+	/* default UPS overtemp value (degrees Celcius - 0.0 means ignore) */ //EW
+static	double	upsovertemp = 0.0; //EW
+
 	/* default "all communications down" warning interval (seconds) */
 static	int	nocommwarntime = 300;
@@ -546,4 +549,13 @@
 	}
 
+//EW >>>>>>
+	if (!strcmp(var, "temp")) {
+		query[0] = "VAR";
+		query[1] = ups->upsname;
+		query[2] = "ups.temperature";
+		numq = 3;
+	}
+//EW <<<<<<
+
 	if (numq == 0) {
 		upslogx(LOG_ERR, "get_var: programming error: var=%s", var);
@@ -770,4 +782,21 @@
 }
 
+//EW >>>>>>
+static void ups_overtemp(utype *ups)
+{
+	if (flag_isset(ups->status, ST_OVERTEMP)) {		/* no change */
+		debug("ups_overtemp(%s) (no change)\n", ups->sys);
+		return;
+	}
+
+	debug("ups_overtemp(%s) (first time)\n", ups->sys);
+
+	/* must have changed from !OVERTEMP to OVERTEMP, so notify */
+
+	do_notify(ups, NOTIFY_OVERTEMP);
+	setflag(&ups->status, ST_OVERTEMP);
+}
+//EW <<<<<<
+
 /* cleanly close the connection to a given UPS */
 static void drop_connection(utype *ups)
@@ -1163,4 +1192,12 @@
 	}
 
+//EW >>>>>>
+	/* UPSOVERTEMP <num> */
+	if (!strcmp(arg[0], "UPSOVERTEMP")) {
+		upsovertemp = atof(arg[1]);
+		return 1;
+		}
+//EW <<<<<<
+
 	/* NOCOMMWARNTIME <num> */
 	if (!strcmp(arg[0], "NOCOMMWARNTIME")) {
@@ -1563,4 +1600,31 @@
 }
 
+//EW >>>>>>
+/* deal with the ups.temperature for this ups */
+static void parse_temperature(utype *ups, char *temperature)
+{
+	double temp;
+
+	debug("     temperature: [%s]\n", temperature);
+
+	/* empty response is ignored -- not all ups return temperatures */
+	if (!strcmp(temperature, "")) {
+		clearflag(&ups->status, ST_OVERTEMP);
+		return;
+	}
+
+	/* get the temperature as a double */
+	temp = atof(temperature);
+
+	/* check the temperature against the overtemp value */
+	if (temp > upsovertemp)
+		ups_overtemp(ups);
+	else
+		clearflag(&ups->status, ST_OVERTEMP);
+
+	debug("\n");
+}
+//EW <<<<<<
+
 /* see what the status of the UPS is and handle any changes */
 static void pollups(utype *ups)
@@ -1578,4 +1642,19 @@
 		debug("polling ups: %s\n", ups->sys);
 
+//EW >>>>>>
+	/* if the user wants us to check for overtemp */
+	if (upsovertemp > 0.0) {
+		set_alarm();
+
+		if (get_var(ups, "temp", status, sizeof(status)) == 0) {
+			clear_alarm();
+			parse_temperature(ups, status);
+		}
+
+		/* fallthrough: no communications */
+		clear_alarm();
+	}
+//EW <<<<<<
+
 	set_alarm();
 


+++ upsmon.config (changes somewhere in the config file)

# --------------------------------------------------------------------------
# UPSOVERTEMP - Temperature (in Celcius) which is too high for operation
#
# upsmon will check all UPS that return temperature information against this
# value.  If the UPS temperature exceeds this value, an OVERTEMP notification
# will be generated.
#
# Note that certain UPS are renown for cooking and even burning up batteries
# (some reports of spectacular battery fires have been received).  From actual
# observed log data, it appears that prior to burning up the batteries, the
# UPS internal temperature rises significantly.  Hence, monitoring the UPS
# temperature can be a valuable tool towards detecting battery cooking, before
# the UPS burns the place down (the UPS is supposed to solve problems, not
# cause them, isn't it).
#
# Once again, typical observed internal temperatures are in the 40 to 50 degree
# Celcius range.  Observed temperatures of 80 degrees Celcius prior to an
# actual battery failure are indicative of pending failure.  Thus, to be safe,
# the the UPSOVERTEMP value should be set in the 60-70 degree range.

UPSOVERTEMP 60.0

# OVERTEMP : The UPSOVERTEMP value has been exceeded (for UPS that return temp)

NOTIFYFLAG OVERTEMP SYSLOG+EXEC


----------------------------------------------------------------------

>Comment By: Arjen de Korte (adkorte-guest)
Date: 2007-01-04 16:35

Message:
You can handle this through a script that monitors the UPS
temperature through upsc without any changes to upsmon by
running

	upsc myups at somewhere ups.temperature

and parse the results. If it determines the temperature is
too high, it could send off a message to the operator or
switch it off through sending an instcmd to the UPS to
shutdown the UPS and keep it off.

I'm not in favor of doing this in upsmon, since only the
'ups.status' is guaranteed to be available for each driver.
If we start adding variables that *might* be supported,
there is no end to the number of possible variables. Where
would we stop?

Furthermore, polling for the temperature doesn't need to be
done as frequently as the line voltage, since it won't
change that quickly (unless the UPS *is* on fire already).
You don't need the near instantaneous reaction like we have
for input/battery state changes.

Adding 'TEMP' to the 'ups.status' might be a good idea, but
requires changes to the driver. It would be a much better
option than changing upsmon in the way proposed here though.
It should be the driver to decide something is not right and
upsmon then acts upon that notice. I'm against reversing the
order of events, since if upsmon is somehow not able to talk
to the driver, nothing is done to resolve the situation.

----------------------------------------------------------------------

You can respond by visiting: 
http://alioth.debian.org/tracker/?func=detail&atid=411544&aid=303751&group_id=30602



More information about the NUT-tracker mailing list