RAID monitoring without if-elif-elif deserts and regex nightmares

Monitoring RAID subsystems is essential. but writing a parser for the output of the CLI1 tool of each RAID card can be time consuming (not speaking of error-prone and boring). In particular as new firmware and/or CLI-versions often have (slightly) different output even for the same hardware.

raid-monitor for Xymon takes a different approach: a known-good state of the RAID subsystem if stored as the reference. In every run the current state is compared to that reference and every difference is BAD and reported as red (along with the diff to make it easy to see what has changed compared to the know-good reference).

Out-of-the box some common RAID systems are supported

  • Adaptec aacraid with arcconf-CLI
  • LSI MegaRAID with megaraid
  • 3ware with twcli
  • Areca with cli|cli64
  • Linux software raid, mdraid via /proc/-filesystem

Adding support for new cards is pretty easy if the raid status is accessible from the shell (e.g. via cli or proc-interface). Either the included sample-module or the existing modules can be used as a starting point: Basically the CLI commands for checking that status are to be executed. Variable output (like temperature data) has to be filtered to avoid false positives.

One can argue that temperature monitoring, graphing of media errors, … are required. This could be added of course but after using raid-monitor for years in setups of different sizes and complexities I’m still very happy with it’s simplicity, ease of setup and low overhead.

For more details check out the project page. raid-monitor for Xymon is free to use.

  1. Command Line Interface