This page is written for an audience of system adminitrators who wish to use MRTG or similar tools to measure more than just the bandwidth usage of Cisco routers. We do that at Trolltech, and with this page we want to help other MRTG users learn from our experiences.
Bandwidth measurementWe measure the bandwidth utilization for each port of each switch. With Cisco switches this hasn't been a problem, but Intel and D-Link switches haven't been as cooperative.
We use MRTG with
switches. We had to
upgrade the firmware to get them to work. The firmware we use at
the moment is version 4.something and the file is called
460_45run.tfp. With this upgrade, we're happy with the
switches. We do not use any of the features in Intel's
switch-specific MIB file.
We have also tried a D-Link switch, but D-Link's SNMP support is unusable in practice. D-Link writes:
This problem is not a bug, just only a definition issue.If that sounds stupid, that's because it is stupid. Their "definition" makes the switch count only about 0.003% of the traffic passing through it.
In the MIB-II (RFC-1213 standard), the definition for IF is not very strict. At D-Link switch, we define IF as the CPU/Controller. That is, the switch interface is the controller, not the port. The IFCounter will be counted on if the packet go through the controller. In the console program, the port statistic counts the number of packets which go through port. For this reason, the information from SNMP will differ from the information from telnet.
D-Link has twice told us that they will not fix this bug (oops — change this definition), so we will return that switch. (The console program, which counts correctly, can't really be used with MRTG. Its output is too hard to parse.)We use MRTG with several Merlin-Gerin (MGE) UPSes, measuring load, battery lifetime, number of power failures, mains power voltage and mains power frequency.
- Output load: This is the output effect as percent of the UPS's
maximum output effect. The OID is 184.108.40.206.220.127.116.11.18.104.22.168.1, and
here's an example:
Title[upsname-load]: Upsname: Load
PageTop[upsname-load]: <h1>Upsname: Load</h1>
Options[upsname-load]: growright, gauge, nopercent
- Battery lifetime: This reports the number of minutes for which
the UPS has battery capacity, at its current load. The OID is
22.214.171.124.126.96.36.199.2.3.0, so we use a line like this:
- Effect in and out: This is the number of watts the UPS pulls from
the mains and delivers to its equipment. The input number is
typically quite a bit higher than the output number, as the UPS itself
needs power. The input OID is 188.8.131.52.184.108.40.206.220.127.116.11.1 and the
output OID is 18.104.22.168.22.214.171.124.126.96.36.199.1. An old MGE we have does not
support the input measurement; the new ones do support it.
- Battery uses: This OID is supposed to measure the number of times
the UPS has decided to stop drawing current from its power source and
use the battery instead. The OID is 188.8.131.52.184.108.40.206.3.1.0.
- Mains voltage and frequency. There hasn't been any use for our
frequency graph, but our voltage graph showed us that one particular
"power failure" was not complete: The voltage dropped from 220V to
120V. The OID for the mains frequency is 220.127.116.11.18.104.22.168.22.214.171.124.1,
and the unit is dHz (ie. the value will be 500 for 50Hz AC). The OID
for the mains voltage is 126.96.36.199.188.8.131.52.184.108.40.206.1.