In the previous article on Agent-less Monitoring With Xymon Using xymon-rclient a basic monitoring setup of a QNap NAS appliance was shown. As promised here is the follow-up on the BusyBox-specific adjustments to get more metrics “right”. As a bonus an issue with RAID-monitoring on QNap is addressed to prevent false positives in Xymon. The resulting client-script is usable with other BusyBox-based systems and is in use with UniFi WLAN Access Points from Ubiquiti Networks.

A BusyBox userland/environment/root-filesystem can be found on many appliances1. This article will focus on QNap NAS systems — these are running busybox versions v1.01. Unfortunately QNap uses multiple versions compiled with this version-number that behave differently: The two versions covered here have the compile dates 2011.02.08 and 2015.05.212. To check the version and build-date just type busybox once logged in to the NAS appliance.

BusyBox is aimed at embedded systems and thus is optimised for size3. For that reason it provides a rather minimal version of a userland in a multi-call binary and some commands are

  • just not present or
  • they have limited features

Unfixable differences

Things in BusyBox we can do nothing about without modification of the system, but that lead to limited or missing information being displayed on the Xymon server are

  • top-output: This is very limited compared to the procps-version present of full-blown distros.

  • ps-output: Very basic output and no further formatting features, i.e. no tree formatting.

  • no vmstat-tool at all: No IO-wait metric for that reason. Comment that section as an error is logged otherwise during each run.

      ## busybox on QNap has no `vmstat` and no `nohup`:
      # vmstat
      #nohup sh -c "vmstat 300 2 1>$XYMONTMP/xymon_vmstat.$MACHINEDOTS.$$ 2>&1; mv $XYMONTMP/xymon_vmstat.$MACHINEDOTS.$$ $XYMONTMP/xymon_vmstat.$MACHINEDOTS" </dev/null >/dev/null 2>&1 &
      #sleep 5
      #if test -f $XYMONTMP/xymon_vmstat.$MACHINEDOTS; then echo "[vmstat]"; cat $XYMONTMP/xymon_vmstat.$MACHINEDOTS; rm -f $XYMONTMP/xymon_vmstat.$MACHINEDOTS; fi
    

Fixable differences

For some commands with different output we can do some tweaks though. The changes compared to the stock-version client/bin/xymonclient-linux.sh in detail:

  • free-output differs: Remove the leading spaces in the output to make the Xymon parser happy:

      echo "[free]"
      ## busybox has leading spaces in output:
      #free
      free | sed -e 's/^[[:space:]]*\([Mem|Swap].*$\)\{1,\}/\1/'
    
  • df-options missing: We ignore the EXCLUDES-fiddling and directly use df. For the older (2011.02.08) version of BusyBox the -c option (corrects a formatting error) does not yet exist, so it must be removed in the snippet show below. It’s not clear when QNap changed this; test your setup for -c-support by running the df -kc interactively on the NAS and if it runs without errors the option is supported, otherwise just use df -k below:

      echo "[df]"
      EXCLUDES=`cat /proc/filesystems | grep nodev | grep -v rootfs | awk '{print $2}' | xargs echo | sed -e 's! ! -x !g'`
      ## busybox does support many options:
      #df -Pl -x iso9660 -x $EXCLUDES | sed -e '/^[^  ][^     ]*$/{
      #N
      #s/[    ]*\n[   ]*/ /
      #}'
      df -kc
      ## no inode-support in busybox:
      #echo "[inode]"
      #df -Pil -x iso9660 -x $EXCLUDES | sed -e '/^[^         ][^     ]*$/{
      #N
      #s/[    ]*\n[   ]*/ /
      #}'
    

QNap speciality: RAID monitoring

The reported RAID setup in /proc/mdstat on QNap is special and may lead to a false red alert; it uses an un-usually high number of devices in RAID-1. To protect the OS and configuration this information is witten to every disk. We filter/rewrite the multi-device raid1-devices to kind of “hide” the lines from the Xymon parser by prefixing them with an (arbitrary) string.

First an example from a 16-bay QNap system with 2 RAID-6 volumes defined:

[~] # cat /proc/mdstat
    Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
md2 : active raid6 sdp3[7] sdo3[6] sdn3[5] sdm3[4] sdl3[3] sdk3[2] sdj3[1] sdi3[0]
                 23382381696 blocks super 1.0 level 6, 64k chunk, algorithm 2 [8/8] [UUUUUUUU]
                 bitmap: 2/30 pages [8KB], 65536KB chunk

md1 : active raid6 sdh3[7] sdg3[6] sdf3[5] sde3[4] sda3[3] sdb3[2] sdd3[1] sdc3[0]
                 23382381696 blocks super 1.0 level 6, 64k chunk, algorithm 2 [8/8] [UUUUUUUU]
                 bitmap: 3/30 pages [12KB], 65536KB chunk

md256 : active raid1 sdp2[15](S) sdo2[14](S) sdn2[13](S) sdm2[12](S) sdl2[11](S) sdk2[10](S) sdj2[9](S) sdi2[8](S) sdh2[7](S) sdg2[6](S) sdf2[5](S) sde2[4](S) sda2[3](S) sdb2[2](S) sdd2[1] sdc2[0]
                 530112 blocks super 1.0 [2/2] [UU]
                 bitmap: 0/1 pages [0KB], 65536KB chunk

md13 : active raid1 sdc4[0] sdp4[15] sdo4[14] sdn4[13] sdm4[12] sdl4[11] sdk4[10] sdj4[9] sdi4[8] sdh4[7] sdg4[6] sdf4[5] sde4[4] sda4[3] sdb4[2] sdd4[1]
                 458880 blocks super 1.0 [24/16] [UUUUUUUUUUUUUUUU________]
                 bitmap: 1/1 pages [4KB], 65536KB chunk

md9 : active raid1 sdc1[0] sdp1[15] sdo1[14] sdn1[13] sdm1[12] sdl1[11] sdk1[10] sdj1[9] sdi1[8] sdh1[7] sdg1[6] sdf1[5] sde1[4] sda1[3] sdb1[2] sdd1[1]
                 530048 blocks super 1.0 [24/16] [UUUUUUUUUUUUUUUU________]
                 bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

The “trouble”-devices are the RAID-1 devices used by the QNap-OS and for extensions (md9 and md13 in the example above). These have <bay|disk count> devices. At least for the 16-bay system this is configured to have a maximum of 24 disks: [UUUUUUUUUUUUUUUU________]: 16 used, 8 missing. It’s exactly the 8 missing disks that make the Xymon raid-status go red by default. A 16-disk RAID-1 provides plenty of redundancy and in almost all setups the physical disks are used in other raid-sets too and failing disks are detected in these raid-sets.

We can rewrite the mdstat-output so it still sends all data to Xymon but prefix the “trouble-lines” with a string to prevent the server-side code of Xymon to parse it: NOTE: adjust md9 and md13 to fit your configuration.

// original:
if test -r /proc/mdstat; then echo "[mdstat]"; cat /proc/mdstat; fi

// new:
if test -r /proc/mdstat; then echo "[mdstat]";
        ## QNap specific:
        ## filter md13 and md9 as these are raid1 w/ 16 devices (_very_
        ## redundant) but for some reason the md-device is configured to have 24
        ## devices, i.e. 8 devs are missing which causes Xymon to raise an
        ## alert:
        cat /proc/mdstat | awk '/^m/ { md=$1; mdlvl=$4; } /^$/ { md=""; mdlvl="" } { if ( ( (md == "md9") || (md == "md13") ) && (mdlvl == "raid1") ) { print "HIDE FROM XYMON: " $0 } else { print } }';
fi

Save the modified version as xymonclient-linux-busybox.sh in client/bin/ as the rclient-extension searches the scripts there. Next we will use this modified client-script with the xymon-rclient-extension (discussed in the last article Agent-less Monitoring With Xymon Using xymon-rclient).

Using with xymon-rclient

Assumed the modified client-script from above is saved in client/bin/xymonclient-linux-busybox.sh we can configure rclient to use it for the QNap systems.

Picking up the example from that post we add the scriptos()-config to the RCLIENT-tag in hosts.cfg:

1.2.3.4 qnap.local  # https://qnap.local/ "RCLIENT:cmd(ssh -T admin@%{H}),ostype(linux),scriptos(linux-busybox)"
1.2.3.5 panq.local  # "RCLIENT:cmd(ssh -T admin@%{H}),ostype(linux),scriptos(linux-busybox)"
1.2.3.6 foo.local   # "RCLIENT:cmd(ssh -T admin@%{H}),ostype(linux),scriptos(linux-busybox)"

And that’s it. In the next run the memory and disk checks should report correct data and the raid-check should be green (assumed your RAID sets are in optimal state). In case the mdstat-modification was necessary for your NAS the rewritten lines still show up on the raid-column but are not evaluated to a red-status any more.

General considerations

The rclient-approach has the obvious advantage that it requires no modification to the local OS (except for adding an ssh-key to authorized_keys). This allows to monitor systems without a native xymon-client to be monitored.

The drawbacks are increased overhead and load on the Xymon server itself. There is also a security aspect as the monitoring user on the Xymon server can log in to all the “rclient clients” with a potentially privileged account.

Conclusions

Using xymon-rclient and the modified Xymon client script xymonclient-linux-busybox.sh we can monitor QNap systems for the most essential metrics (cpu, disk, memory, ports, processes, uptime, network-interfaces and routing). Due to the missing vmstat we are missing IO-wait which would be very handy for a storage device. The BusyBox-modified Xymon client script can be used with other appliances. It is running successfully with monitoring UniFi WLAN Access Points “UAP-LR” and “UAP-AC” from Ubiquiti Networks.


  1. Wikipedia even states it being the “de facto standard core user space toolset for embedded Linux devices”.

  2. I seemed to remember a licensing change to GPL when development was handed over about 10 years ago and suspected this being the reason for sticking with this old version from the year 2005 (current is v1.24). But I could not find evidence that the license was anything other than some version of GPL (checked busybox license page ). Maybe QNap just made heavy modifications to v1.01 and keeps backporting new versions.

  3. The BusyBox about page states: “BusyBox combines tiny versions of many common UNIX utilities into a single small executable.”