The big flashing DevOps thing

Greg Cope mailed us a few weeks ago with a pointer to this project, which has been monitoring DevOps at the Financial Times (FT) here in the UK.

SAWS-assembly

It’s hard for everyone in the group to simultaneously maintain an overview of the health of the stack under normal circumstances. They use Nagios, a great piece of kit with one fatal flaw: Nagios emails everybody on the team every time a check changes state. Checks change state all the time, and that many emails causes the FT team to enter a state where absolutely none of them reads emails from Nagios, because they clog up their inboxes.

Silvano Dossan, who works on the team, says:

Our Nagios servers have been configured to check every important parameter, from basic disk and CPU  checks to HTTP, application, database and jconsole via Jolokia. All we need is some way to communicate clearly when a check fails.

nagios emails

The team rejected shared office displays in the form of monitors (too much text, too hard to read from a distance). They also rejected a particularly horrible idea whereby a single team member would be allotted the task of staying alert and monitoring all Nagios’ mails for the week, feeding back news of any disasters to the rest of the group. Sounds horrific.

Silvano sat back to think about exactly what they needed and didn’t need from alerts.

None of the above satisfied our needs. Something is missing. When something fails I want an alarm bell, a siren, or a flashing light that is so bright my eyes explode. A warning system that is in everyone’s face. No escape. There should be no excuse for anyone to not know when something in the stack has broken. “What do you mean you didn’t know the site was down, there is a mongoose running around the office!”

Introducing SAWS ! “Silvano’s Awesome Warning System”.

Well I did spend my evenings and weekends making this so forgive me the naming it after myself.

JAWS poster with SAWS title

Rejecting the mongoose idea, Silvano bought a strip of something called Blinkytape (having looked at their website I’m off to buy one myself when I’m done writing this): a flexible strip of 60 RGB LEDs, with a microcontroller already embedded in the strip. Using a Raspberry Pi and a lot of glue and sticky tape, he produced a perfectly simple, unmissable display to demonstrate the health of the stack.

lit led strip

Silvano says:

A good monitor system should display the health status of the stack to as many people as possible in as simple format as possible. The more people that know the health state of the stack, the better chance of someone picking it up and resolving the issue quickly.

SAWS simply shows by grouping LED’s if each Nagios server has an error. Green, orange, yellow, red and flashing red LED’s representing OK, Unknown, Warning, Critical or Critical for over 30 minutes. Blue LED’s swoosh back and forth like a Cylon to indicate the python script is running and the data is up to date.

It’s an ingenious solution: and it works. There can’t be a cleaner stack in the country, now SAWS is in place, and the team have been incredibly enthusiastic about the change. You can read more about it over at Engine Room the FT’s tech blog; and Silvano has made all the code available at GitHub.

14 comments

pgnunes avatar

Hi. Not for this specific case but for simple monitoring, give a try to Meerkat-Monitor. It has a nice dashboard you can use to everyone see immediately status of services. http://meerkat-monitor.org

simon avatar

The simplest is a “whing-o-meter”. It takes all monitor inputs (including calculations of predictions/premonitions), does some magic weighted averaging and displayS how whingy the system is feeling.

This can be an anglog needle, which provides better ergonomics (our eyes are hardwired to see angle), or geeky with a sliding row of lights (like on a graphic equaliser).

Andrew Oakley avatar

When the revolution comes, us colour-blind people will rise up (and we’re one in ten males, so that’s a lot), we will take control of the ports where LEDs are imported, and we will will cast all green LEDs (and red-green LEDs) onto a large toxic bonfire. We will march into a bold future of only blue, yellow, red and white LEDs. ONWARD TO VICTORY!

Liz Upton avatar

We’re actually feeling a bit sheepish about you and your colour-blind brethren: there are a number of stylesheet changes we’re implementing on this site in the coming weeks to make navigation a bit easier for you. Bugger all I can do about green leds, though, I’m afraid.

Andrew Oakley avatar

I can’t see anything wrong with the RPi site (although that could be the problem – I may be missing something). Generally if you can print it out in black & white / greyscale and it still makes sense, then it makes sense for colour blind folks. It’s a very approximate method but it does work.

Liz Upton avatar

Links don’t work as well as I’d like for people with colour-blindness. (And you may well miss them as they are at the moment; they’re a red and then a darker red, and they aren’t underlined.) We’ll be fixing that, and some other niggles, in an update soon.

Nic avatar

Ooh, goody for the update. At the moment I just wave my mouse around the article like a magic wand, hoping that some text will turn into a link.

simon avatar

The worst thing is when product designers have a double LED in the one hole. Allegedly it changes color between happy and not, but can colorblind people see it?

Having a pair of holes is far more ergonomic: all you have to be able to do is to read the legend and see if any light (of any kind) is coming out of that hole.

Sadly Disabilities Discrimination Act doesn’t seem to cover this. Nor websites, printed matter, …. [].

AndrewS avatar

Presumably because a single bi-colour LED is cheaper than two mono-colour LEDs?

simon avatar

It’s also cheaper not to widen doors, or have ramps or parking.

tiktak avatar

I am not even sure of that, because there are way more single color LEDs produced than bi-color ones (or the cool ones with the three primary colors you can set independently).

Color blindness is a serious issue. Even if you don’t have this problem, I think having one LED with different meanings is a bad design. I don’t want to open the manual to know the difference between green, orange, blue, red, static or blinking every 1 sec.

Also, people get older these days and have more vision issues than in the past. A button to increase the font size and an intelligent window layout adapting to a small resolution may be a good idea if you are designing the next killer app, not targeted to teenagers…

I love system architecture, coding and hacking, but if a lot of people will use your product, you have to hire people to work on ergonomics and make real world experiments with different kinds of people at some point.
There are a lot of problems a programmer can’t even think of.
Listen to your users!

So if one says a bi-color LED makes him prefer a similar product from a competitor, your company loses a customer and some market share because of a bad design, even though your product is technically better and cheaper.

Andrew Oakley avatar

Also, resistors: we hate you.

Alexander Brown avatar

And Grass hate grass and trees. ART TEACHER: arnt you going to put shading on that picture of a tree lined feild? STUDENT: WHAT SHADING! I PAINTED A CANVAS USING A PAINT LABLED THIS MYSTICAL “GREEN” AND ADDED SOME BROWN SPLOGES. WHAT MORE DO YOU WANT! TEACHER: but how about some diffrent color leaves? STUDENT: WHAT DIFFERENT COLOUR LEAVES! sorry just had to put that out their (Dont you love ART lessons) Fortunatly i got some tinted glasses but i am still not convinced about this “Dark Green”…

silviu avatar

Nagstatmon or the nagios checker firefox extension on each involved memeber’s pc.

Comments are closed