Arnt Gulbrandsen
About meAbout this blog

MRTG use at Trolltech

This page is written for an audience of system adminitrators who wish to use MRTG or similar tools to measure more than just the bandwidth usage of Cisco routers. We do that at Trolltech, and with this page we want to help other MRTG users learn from our experiences.

Bandwidth measurement

We measure the bandwidth utilization for each port of each switch. With Cisco switches this hasn't been a problem, but Intel and D-Link switches haven't been as cooperative.

We use MRTG with Intel 460T switches. We had to upgrade the firmware to get them to work. The firmware we use at the moment is version 4.something and the file is called 460_45run.tfp. With this upgrade, we're happy with the switches. We do not use any of the features in Intel's switch-specific MIB file.

We have also tried a D-Link switch, but D-Link's SNMP support is unusable in practice. D-Link writes:

This problem is not a bug, just only a definition issue.

In the MIB-II (RFC-1213 standard), the definition for IF is not very strict. At D-Link switch, we define IF as the CPU/Controller. That is, the switch interface is the controller, not the port. The IFCounter will be counted on if the packet go through the controller. In the console program, the port statistic counts the number of packets which go through port. For this reason, the information from SNMP will differ from the information from telnet.

If that sounds stupid, that's because it is stupid. Their "definition" makes the switch count only about 0.003% of the traffic passing through it.

D-Link has twice told us that they will not fix this bug (oops — change this definition), so we will return that switch. (The console program, which counts correctly, can't really be used with MRTG. Its output is too hard to parse.)

UPSes

We use MRTG with several Merlin-Gerin (MGE) UPSes, measuring load, battery lifetime, number of power failures, mains power voltage and mains power frequency.
  • Output load: This is the output effect as percent of the UPS's maximum output effect. The OID is 1.3.6.1.2.1.33.1.4.4.1.5.1, and here's an example:
    Target[upsname-load]:1.3.6.1.2.1.33.1.4.4.1.5.1&1.3.6.1.2.1.33.1.4.4.1.5.1:community@upsname
    MaxBytes[upsname-load]: 100
    Title[upsname-load]: Upsname: Load
    PageTop[upsname-load]: <h1>Upsname: Load</h1>
    Options[upsname-load]: growright, gauge, nopercent
    YLegend[upsname-load]: percent
    ShortLegend[upsname-load]: %
  • Battery lifetime: This reports the number of minutes for which the UPS has battery capacity, at its current load. The OID is 1.3.6.1.2.1.33.1.2.3.0, so we use a line like this:
    Target[upsname-time]:1.3.6.1.2.1.33.1.2.3.0&1.3.6.1.2.1.33.1.2.3.0:community@upsname ...
  • Effect in and out: This is the number of watts the UPS pulls from the mains and delivers to its equipment. The input number is typically quite a bit higher than the output number, as the UPS itself needs power. The input OID is 1.3.6.1.2.1.33.1.3.3.1.5.1 and the output OID is 1.3.6.1.2.1.33.1.4.4.1.4.1. An old MGE we have does not support the input measurement; the new ones do support it.
  • Battery uses: This OID is supposed to measure the number of times the UPS has decided to stop drawing current from its power source and use the battery instead. The OID is 1.3.6.1.2.1.33.1.3.1.0.
  • Mains voltage and frequency. There hasn't been any use for our frequency graph, but our voltage graph showed us that one particular "power failure" was not complete: The voltage dropped from 220V to 120V. The OID for the mains frequency is 1.3.6.1.2.1.33.1.3.3.1.2.1, and the unit is dHz (ie. the value will be 500 for 50Hz AC). The OID for the mains voltage is 1.3.6.1.2.1.33.1.3.3.1.3.1.
All of these OIDs are from the UPS-MIB. MGE also has a vendor MIB, but we don't use it. The OIDs might be different for big UPSes that may have several power sources.