Postman’s Knock 2

Last May, I wrote a post about the Horizon Scandal, the prosecution by the (UK) Post Office of more than 700 “sub-postmasters” for charges of theft and/or false accounting which were being challenged because the fault was found to lie with a new point-of-sale IT system called Horizon. Earlier this week, the Court of Appeal overturned 39 convictions, clearing the names of those accused. The Criminal Cases Review Commission has invited others caught up in this now to make claims. The implication is that the Post Office may be put in serious financial jeopardy if the levels of compensation are as substantial as many think they may be. The scandal goes to the top of the Post Office, with decisions made by senior executives over their reaction to an increasing level of financial shortfalls across their network; and their relationship to their primary contractor, Fujitsu, and their apparent (but ultimately false) infallibility. Government Ministers are also implicated, given that the Post Office is an “emanation of the state”, even if the majority of its shareholdings are now in the private sector.

(Readers should not confuse the Post Office, which operates the network of post office outlets, with Royal Mail, which actually collects and delivers mail, even though sometimes the two organisations operate from the same premises, their businesses are intertwined and their branding complimentary.)

At the same time, I’ve been reading a number of blogs in the software testing community about how manual testing is dead and how testers are not needed because you can programme a computer to test itself and validate its own code.

This is errant rubbish. Yes, you can run IT applications that will submit a piece of code to a set of tests, comparing a set of outputs from an application with expected results, and these will demonstrate that the code is working as expected. But that’s not an adequate test for any application that will rely on human beings to interact with it.

Let me give you an example. A few years ago, I had an encounter with a public-facing application put out by a major UK supermarket chain. It was designed to allow customers in-store to access their loyalty scheme accounts via touch-screen terminals. I was reasonably OK with this application, until I came to this screen:

(The graphic is my own reconstruction from memory.)

When you landed on this screen, the cursor was in the first box (top left) where it asked for the first line of my address. So I entered it. Then it asked for my postcode. As I said, it was a touchscreen terminal, so I tapped the second box. Nothing happened. I kept tapping. Still nothing. So I tried the on-screen keyboard, using all the usual ways of advancing workflow – the Return key, the Tab key, even the space bar. Nothing worked. Eventually, out of desperation, I tried the big green arrow at the bottom of the page that said “Next”.

You’ve guessed it. The arrow that looks as though it’s intended to take you to the next page actually was there to move you to the next step in the workflow, even on page elements that did not look as though they should be part of that workflow. Yet this was accepted by the supermarket as good design for a workable application. Why? I can only think that the client was assured that the app passed all its tests. And this would have passed an automated test. The test would have gone something like this (though I have no idea what the code said): the logic would be:

  • Input an address (the test suite would draw this from a list of test input data)
  • Advance the workflow (the test software would activate the Next button code)
  • Input a postcode (again, the test suite would draw this from another list of test input data)
  • The application under test would compare the two inputs, looking for a match.
    • If the application finds a match between the input address first line and the input postcode, it will then activate the Next button code again to move the user to the next page.
    • If the application does not find a match, it will display an error message on-screen inviting the user to correct the address they entered in the first box.

Nothing wrong with that. On paper. But it doesn’t match real life. Faced with the screen I’ve shown above, how many people would think that to move from one input box to another, you should tap a button at the bottom of the screen that is designed to look as though it takes you to the next page, and nothing else? I’ve been working with IT applications since the early 1990s, and it took me numerous tries before I tried that button in desperation. And there was another problem.

I couldn’t get the application to accept my address as matching my postcode. My address isn’t to the usual format of house number/street name that many UK addresses have. The format I use is the one on my tenancy agreement. But there are two widely-used databases for UK postcodes, and the other one had my address in a different format. The first line of my address on my tenancy agreement is ‘Flat X, Heatherlea’; but the other database records this as ‘X, Heather Lea Flats’. Now, normally, applications asking the user to validate an address ask for the postcode first, and then present a list of addresses covered by that postcode for the user to select the right one. But this application was doing this the other way around, and would not allow the user to select a variant of their address which a human being would recognise but a machine wouldn’t. And again, this would pass an automated test. The test would have a defined set of test inputs, and a defined set of test matches, and even if the person programming the test inserted a test that would fail, to make sure the error message was properly displayed, it would not cover the possibility that the user had input the correct address but made a simple error.

I’m writing this blog whilst I wait to join an online meeting. The organisers of that meeting have had a bit of a panic in the past twelve hours, as the person who was due to circulate the joining instructions had an e-mail outage last night and couldn’t circulate the meeting link. A colleague who recirculated it (as a belt-and-braces measure) made what he described as a “schoolboy error” of not sending the message as a blind copy, meaning that all recipients could see the addresses of all other recipients, which is not only bad data protection, but also means that any casual reply to that message (“Thanks for the link!”) would be circulated to everyone. This has led me to think of a new test heuristic, a rule of thumb that you should apply when testing. It is the Heuristic of the Elemental Error, that under stress, users may do something wrong that they otherwise would never do or never even think of doing.

And that brings me back to the Horizon Scandal. One of the noted bugs in the system was, apparently, that if you registered a transaction but that transaction didn’t appear on the screen immediately, any further keystrokes from the terminal would be registered as new transactions before the old transaction was completed. But sub-postmasters, not necessarily used to IT installations (especially when they have gone unresponsive) saw nothing happening on their screens and so kept hitting the Return key in the belief that the system had frozen. Instead, the system was silently logging up transactions which did not match the actual money in the till. Of course, when Post Office managers looked at the outputs from Horizon and tried to reconcile them with the cash returns from sub-post offices, they found major discrepancies. That they were then assured that these discrepancies were not down to an IT error but were the result of possible criminal activity is a matter which future enquiries will have to address. Hopefully, they will drill down to the level of the IT managers who put their faith in automated tests, and will challenge them as to why they did not try the system out with human testers as well.

I remain firmly of the view that if your system is going to be used by human beings, it has to be tested with human beings. Human beings are imperfect; they make mistakes. Any IT system has to be able to deal with those mistakes. IT managers are under a responsibility to make sure that this happens, and it’s time they accepted that responsibility instead of taking fashionable industry views that human testers are obsolete. Because it’s one thing for a user not to be able to get at their supermarket discount coupons. It’s quite another if an IT system error leads to someone being sent to jail.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s