Postman’s Knock 3

[TL:DR – a thorough account of how a trusted organisation prosecuted 900 of its own people for crimes they did not commit, extracted large amounts of money from them and sent a good few to prison. They then covered this up for 15 years. 10% of the revenue from sales goes to the fund to assist Sub-postmasters in their claims. Buy this book!]

Regular readers may recollect that I’ve been making a big thing out of the Horizon scandal in the UK Post Office – and rightly so, for it is truly scandalous. I have now read Nick Wallis’ account of his uncovering of the scandal and the fight for justice for the 900 or so people wrongly prosecuted by the Post Office for a failure in software testing, which was magnified many times by a client who accepted at face value a contractor’s assertion that a deployed system was without faults (which any of us would instantly know to be an astonishing claim and almost certainly untrue), but then compounded matters by making strenuous efforts to push that belief to all who queried the integrity of the system. In the course of things, this may well mean that various people perjured themselves and/or may have committed the crime of conspiracy to pervert the course of justice.

Nick Wallis’ book is a detailed account of his involvement with the campaign, from his first coming into contact with one case locally, up to August 2021 when the Post Office’s case had been disproved in court (despite strenuous efforts on their part to derail the legal process) and a start had been made on quashing at least 750 convictions now found to be unjust. In the end, the book is more about the legal arguments and the personal stories of many of the Sub-postmasters involved. As Nick Wallis is not an IT person, the conclusions about systems and testing are only made tangentially, though there is enough in the book for anyone with even an outline of IT technical knowledge to understand many of the shortcomings in the system, its implementation and management. If you have any knowledge of legal process, disciplinary processes and investigative procedures, your reaction might well be the same as mine: as I was reading, many pages provoked a reaction in me of “They did WHAT???!!!”

Last time I posted about Horizon in this blog, I coined the idea of the Heuristic of the Elemental Error, that under stress, users may do something wrong that they otherwise would never do or never even think of doing. This seems to be a major driver behind what many Sub-postmasters did when they found their systems malfunctioning; the book shows how important it is to design and build systems that take that idea into account.

My full review, which would run to some five pages if printed out, can be found here: https://deepwatersreading.wordpress.com/2022/09/29/the-great-post-office-scandal-the-fight-to-expose-a-multimillion-pound-it-disaster-which-put-innocent-people-in-jail-by-nick-wallis/

James Christie has written a series of blog posts on Horizon which go far further into the technical detail than either my review or Nick Wallis’ book. I commend them to you, starting here: https://clarotesting.wordpress.com/2020/05/27/the-post-office-horizon-it-scandal-part-1-errors-and-accuracy/

Sadly, since the book was published, Government and the Post Office still seem to be doing their best to kick the whole matter into the long grass. Kudos to Nick Wallis and all the other campaigners for exposing this scandal and keeping it in the public eye.

Postman’s Knock 2

Last May, I wrote a post about the Horizon Scandal, the prosecution by the (UK) Post Office of more than 700 “sub-postmasters” for charges of theft and/or false accounting which were being challenged because the fault was found to lie with a new point-of-sale IT system called Horizon. Earlier this week, the Court of Appeal overturned 39 convictions, clearing the names of those accused. The Criminal Cases Review Commission has invited others caught up in this now to make claims. The implication is that the Post Office may be put in serious financial jeopardy if the levels of compensation are as substantial as many think they may be. The scandal goes to the top of the Post Office, with decisions made by senior executives over their reaction to an increasing level of financial shortfalls across their network; and their relationship to their primary contractor, Fujitsu, and their apparent (but ultimately false) infallibility. Government Ministers are also implicated, given that the Post Office is an “emanation of the state”, even if the majority of its shareholdings are now in the private sector.

(Readers should not confuse the Post Office, which operates the network of post office outlets, with Royal Mail, which actually collects and delivers mail, even though sometimes the two organisations operate from the same premises, their businesses are intertwined and their branding complimentary.)

At the same time, I’ve been reading a number of blogs in the software testing community about how manual testing is dead and how testers are not needed because you can programme a computer to test itself and validate its own code.

This is errant rubbish. Yes, you can run IT applications that will submit a piece of code to a set of tests, comparing a set of outputs from an application with expected results, and these will demonstrate that the code is working as expected. But that’s not an adequate test for any application that will rely on human beings to interact with it.

Let me give you an example. A few years ago, I had an encounter with a public-facing application put out by a major UK supermarket chain. It was designed to allow customers in-store to access their loyalty scheme accounts via touch-screen terminals. I was reasonably OK with this application, until I came to this screen:

(The graphic is my own reconstruction from memory.)

When you landed on this screen, the cursor was in the first box (top left) where it asked for the first line of my address. So I entered it. Then it asked for my postcode. As I said, it was a touchscreen terminal, so I tapped the second box. Nothing happened. I kept tapping. Still nothing. So I tried the on-screen keyboard, using all the usual ways of advancing workflow – the Return key, the Tab key, even the space bar. Nothing worked. Eventually, out of desperation, I tried the big green arrow at the bottom of the page that said “Next”.

You’ve guessed it. The arrow that looks as though it’s intended to take you to the next page actually was there to move you to the next step in the workflow, even on page elements that did not look as though they should be part of that workflow. Yet this was accepted by the supermarket as good design for a workable application. Why? I can only think that the client was assured that the app passed all its tests. And this would have passed an automated test. The test would have gone something like this (though I have no idea what the code said): the logic would be:

  • Input an address (the test suite would draw this from a list of test input data)
  • Advance the workflow (the test software would activate the Next button code)
  • Input a postcode (again, the test suite would draw this from another list of test input data)
  • The application under test would compare the two inputs, looking for a match.
    • If the application finds a match between the input address first line and the input postcode, it will then activate the Next button code again to move the user to the next page.
    • If the application does not find a match, it will display an error message on-screen inviting the user to correct the address they entered in the first box.

Nothing wrong with that. On paper. But it doesn’t match real life. Faced with the screen I’ve shown above, how many people would think that to move from one input box to another, you should tap a button at the bottom of the screen that is designed to look as though it takes you to the next page, and nothing else? I’ve been working with IT applications since the early 1990s, and it took me numerous tries before I tried that button in desperation. And there was another problem.

I couldn’t get the application to accept my address as matching my postcode. My address isn’t to the usual format of house number/street name that many UK addresses have. The format I use is the one on my tenancy agreement. But there are two widely-used databases for UK postcodes, and the other one had my address in a different format. The first line of my address on my tenancy agreement is ‘Flat X, Heatherlea’; but the other database records this as ‘X, Heather Lea Flats’. Now, normally, applications asking the user to validate an address ask for the postcode first, and then present a list of addresses covered by that postcode for the user to select the right one. But this application was doing this the other way around, and would not allow the user to select a variant of their address which a human being would recognise but a machine wouldn’t. And again, this would pass an automated test. The test would have a defined set of test inputs, and a defined set of test matches, and even if the person programming the test inserted a test that would fail, to make sure the error message was properly displayed, it would not cover the possibility that the user had input the correct address but made a simple error.

I’m writing this blog whilst I wait to join an online meeting. The organisers of that meeting have had a bit of a panic in the past twelve hours, as the person who was due to circulate the joining instructions had an e-mail outage last night and couldn’t circulate the meeting link. A colleague who recirculated it (as a belt-and-braces measure) made what he described as a “schoolboy error” of not sending the message as a blind copy, meaning that all recipients could see the addresses of all other recipients, which is not only bad data protection, but also means that any casual reply to that message (“Thanks for the link!”) would be circulated to everyone. This has led me to think of a new test heuristic, a rule of thumb that you should apply when testing. It is the Heuristic of the Elemental Error, that under stress, users may do something wrong that they otherwise would never do or never even think of doing.

And that brings me back to the Horizon Scandal. One of the noted bugs in the system was, apparently, that if you registered a transaction but that transaction didn’t appear on the screen immediately, any further keystrokes from the terminal would be registered as new transactions before the old transaction was completed. But sub-postmasters, not necessarily used to IT installations (especially when they have gone unresponsive) saw nothing happening on their screens and so kept hitting the Return key in the belief that the system had frozen. Instead, the system was silently logging up transactions which did not match the actual money in the till. Of course, when Post Office managers looked at the outputs from Horizon and tried to reconcile them with the cash returns from sub-post offices, they found major discrepancies. That they were then assured that these discrepancies were not down to an IT error but were the result of possible criminal activity is a matter which future enquiries will have to address. Hopefully, they will drill down to the level of the IT managers who put their faith in automated tests, and will challenge them as to why they did not try the system out with human testers as well.

I remain firmly of the view that if your system is going to be used by human beings, it has to be tested with human beings. Human beings are imperfect; they make mistakes. Any IT system has to be able to deal with those mistakes. IT managers are under a responsibility to make sure that this happens, and it’s time they accepted that responsibility instead of taking fashionable industry views that human testers are obsolete. Because it’s one thing for a user not to be able to get at their supermarket discount coupons. It’s quite another if an IT system error leads to someone being sent to jail.

Why we test

BBC Radio 4 is running a series this week about the Post Office Counters Horizon IT system scandal. System errors (which seem to have arisen after the rollout of new PIN keypads) led to massive discrepancies between the sums of money the post office staff took and the amounts recorded on the system. The Post Office pursued prosecutions; many of the affected staff had their livelihoods and lives ruined; some went to jail.

(For non-UK readers: post office services in much of the UK outside town and city centres are delivered through a network of “sub-post offices” – post office counters set up in local or village stores, often run as a subsidiary business by the shopkeeper. The British Post Office ceased to be a Government department in 1969, instead becoming a Government-owned corporation. From 2001, it adopted a more commercial outlook, including formal share capitalisation, though with a controlling interest and two ‘golden shares’ held by the Secretary of State for Trade and Industry and the Treasury Solicitor respectively. In 2011, this was changed to a structure of 90% of shares being issued on the financial markets.)

A campaigning group of sub-postmasters brought a civil claim for compensation in December 2019 after the Post Office settled, with the judge providing some scathing criticism of the Post Office, and Fujitsu, the IT supplier, who had to pay £57.75 million to settle the case (“Fudge-it-for-you” as they were known in other organisations I’ve had dealings with before Horizon). Further, in March 2020 the Criminal Cases Review Commission decided to refer to appeal the convictions of 39 sub-postmasters on the grounds of the Post Office’s “abuse of privilege”. Compensation is liable to be eye-wateringly high, especially if the Court of Appeal decides to make an example of them.

Coincidentally (or perhaps not), testing blogger James Christie has written a series of three posts on the subject. I commend them:

https://clarotesting.wordpress.com/2020/05/27/the-post-office-horizon-it-scandal-part-1-errors-and-accuracy/

(Read this one for the account of “the Dalmellington bug” alone. Let the implications of the bug sink in – it’s a gem. It shows how a system issue can have catastrophic real-world impacts)

https://clarotesting.wordpress.com/2020/05/27/the-post-office-horizon-it-scandal-part-2-evidence-the-off-piste-issue/

https://clarotesting.wordpress.com/2020/05/27/the-post-office-horizon-it-scandal-part-3-audit-risk-perverse-incentives/ (This has a lot to say on the subject of risk, and how unsafe it is to try to reduce risk to a set of easily-measurable and assessible criteria.)

This is of interest because it shows the impact of failing to carry out proper regression tests following a change way downstream in the system. Even then, the problem could have been addressed had senior managers not taken the view that the system was infallible and so any cash shortfalls had to be down to malicious actors. Rolling out any system that is manifested in the real world, with real people interacting with real kit, demands the highest possible standards of testing, together with clarity of thought on the part of those administering the business. Sadly, it seems that neither were the case in this instance; and innocent people suffered. Robust exploratory testing, feeding back not only to developers but also to business managers, might have avoided this whole sorry state of affairs.

A Nudge is as good as a wink

(This post is cross-posted from my primary blog, Steer for the deep waters only, where it was entitled Wierdos and Misfits. Non-UK readers will have to excuse the framing text which relates to recent UK politics; but the kernel of the post concerns the origin of “Nudge theory”, often cited in a lot of books recommended by testers.)

“Weirdos and Misfits.” These are the people who, it is said, prime ministerial advisor Dominic Cummings is looking for to reinvigorate the upper echelons of the Civil Service and shake them out of their complacent, traditionalist ways.

I’ve got some news for him. The weirdos and misfits are right there under his nose. And have been since before he was born. They are in the lower strata of the civil service, for the most part ignored because they hide their lights under bushels, either from choice or because they just assume that no-one would be interested in their outlandish ideas or the odd stuff they get up to in their free time. I refer, of course, to science fiction fans.

Let me give you an example.

For the past couple of months, I’ve been wading through Warhoon 28, a fanzine published by a New Yorker, Richard Bergeron, in 1978. It’s taken that long because Warhoon 28 was rather special. It was a 650-page hardback book, showcasing the writing of Belfast fan Walt Willis. Willis, together with James White and Bob Shaw (both later to become respected authors in their own right) formed the nucleus of what was called ‘Irish Fandom’ in the 1950s and early 1960s.

I’ve written in other blog posts about science fiction fans and their conventions; I’ve mentioned fanzines from time to time, and how science fiction is a genre whose fans can turn into influential writers. But what I may not have made clear is just who these fans are “in real life” (a term more familiar now in the wired world but that started out in fandom). Walt Willis’ day job was as a civil servant within the Northern Ireland Parliament at Stormont. He never wrote much about his everyday work, but in a fanzine article in 1975 he alluded to his position, working directly to a Minister. Whilst that made him pretty senior, he certainly didn’t start out that way; in those days, you could start as a Messenger and end up a Permanent Secretary. (The last person to do that was Sir Terry Heiser, Permanent Secretary at the Department of the Environment, and he retired in 1992.)

Through his fanzines, Slant and later Hyphen, Willis changed the nature of fan writing, putting the emphasis not so much on the “serious and constructive” discussion of science fiction, but more on what fans did, both in pursuit of their interest and in their everyday lives. Being located in Belfast, Willis was in a way as distant from other fans in England as from the far greater number of fans in North America, and this led to his becoming well-known on both sides of the Atlantic. His column, The harp that once or twice (named for a quotation from James Joyce’s Ulysses), ran in a number of fanzines, starting with Lee Hoffman’s Quandry in 1951. His reputation grew to the extent that he became the first recipient of funds raised by North American fans to bring him to the 1952 World Convention, held that year in Chicago. This started a series of fund-raising campaigns to encourage international interchange of fans that has run ever since and expanded to embrace Australia and New Zealand.

In 1969, Willis described an idea he’d had:

“...the principle is one I follow in my day-to-day work as a senior civil servant concerned with the problem of making people behave better. Privately I think of it as Nudgism, the theory that people can be induced voluntarily to do things you couldn’t force them to do.

Now, if you Google “nudge theory”, you’ll be pointed to a lot of references to the 2008 book by Richard Thaler and Cass Sunstein, “Nudge: Improving Decisions about Health, Wealth, and Happiness“.

Walt Willis only ever described his idea once in print, in an earlier edition of Warhoon  (Warhoon 26, February 1969). I’ve seen no evidence that either Thaler or Sunstein ever saw that, and at this stage it would probably be impossible to access what circulation records there might be (if, indeed, any ever existed) to see how the idea got passed on. I suspect it submerged into the collective unconscious and only surfaced some 35-40 years later in a conversation with one of the authors that went “I read somewhere about an idea…”

I can think of two science fiction fans personally known to me who occupied senior roles in the civil service; neither of them were parachuted in but worked their ways up from fairly humble beginnings. And I know of plenty more who never progressed to rarefied heights but did solid, sterling work, all the while having a range of knowledge and ideas that generally never got used in their day jobs.

It’s widely recognised that the Internet is the way it is because so many of the people who helped develop it were science fiction fans, familiar with a world-spanning network of people openly sharing their thoughts, aspirations and ideas on a wide range of subjects with ink and paper through the postal system, with paper fanzines standing in for websites and letters flying across oceans and continents binding the whole together. People became lifelong friends without ever seeing each other, just the way they do now through the medium of e-mail and social media. If fans could invent something that complex without even trying, then someone who took the trouble to identify that talent in a wider population – such as the Civil Service – could harness that ingenuity to bring about powerful and lasting change.

How about it, Mr. Cummings?

“Tell me again, Professor, about your Computational Engine.”

My current relaxation reading is Alastair Reynolds’ 2007 science fiction novel The Prefect (recently reissued under the title Aurora Rising). It has lessons for us in terms of software deployment, because a deployment plays a key role in the story.

(Note: some readers may wish to note a trigger warning for issues relating to child abuse. This is wholly coincidental and has no bearing on the subject of the post.)

In the 25th century, the human race has expanded across our part of the galaxy, splitting into various factions and embracing a range of personal technological enhancements along the way. In particular, we have colonised a world, Yellowstone, though the planet is only habitable in small areas. But its system is resource-rich, and so rather than being restricted to one major settlement on the planet surface (Chasm City), development has moved into orbital habitats above Yellowstone. Eventually, these habitats come to number more than ten thousand. At the time that Reynolds started writing stories set in this universe, it must have seemed like a jolly fine wheeze to name the habitats, collectively, the Glitter Band (because from the surface of Yellowstone, they form a band in the sky that glitters, of course). Having had a lot of success with these novels and stories, Reynolds has found himself stuck with this key feature, especially as it has no unfortunate connotations for readers outside the UK or for readers less than, say, 35 years old.

So the Glitter Band consists of habitats which have a high dependency on IT systems. What policing there is restricts itself to matters relating to the exercise of democracy across the whole system, in particular supporting IT systems that facilitate this on an ongoing, real-time basis. This policing is in the hands of a (comparatively) small group of Prefects. So in The Prefect/Aurora Rising, one Prefect with something of an unfortunate past has the job of implementing a software upgrade across all ten thousand habitats, not knowing that a traitor has infiltrated the police force and has modified her software package with trojans that will precipitate a crisis system-wide.

However, this Prefect has the good sense to not roll the upgrade out automatically across all ten thousand habitats; instead, she decides to complete a manual installation on the four habitats she considers may have the most challenges, because of unique features of their existing installations. So when the final installation is complete, and that completion activates the trojan, leading to the cessation of all shared services and communications within the habitats, followed by the deployment of servitors (read: robots) dedicated to dismantling all the resources of each habitat to feed nanotechnology manufactories to replicate huge multiples of military-grade servitors to spread the insurrection, the problem is confined to four habitats instead of becoming a knock-out blow to the police force and the wider human population around Yellowstone.

How this is resolved will probably not involve software (I haven’t finished the book yet). And the role of testing isn’t really touched upon; we have to assume that the Prefect has done her own code reviews and unit tests, as she is confident that the upgrade should work as advertised (and so is the traitor, who – spoiler alert! – is above her in the chain of command). But equally, as this Prefect is only recently promoted, she retains a degree of uncertainty in her own work and so takes the precaution – completely reasonably, says my tester’s mind – not to roll the whole thing out at once. Her fear was that the upgrade wouldn’t work, or would have some unforeseen knock-on effect as none of the habitats’ systems are necessarily similar, making pre-deployment end-to-end testing impossible. Indeed, the first of her four deployments needs direct intervention because of non-standard additions to the habitat’s computing core.

Science fiction, being a genre often devoted to speculation about the future, provides some interesting examples of where the scale of our current digital transformation has impacted our lives. Science fiction is not prediction; instead, it shows us possible consequences of trends which can be seen in our present day lives. Sometimes, it can seem uncannily accurate; the case of Cleve Cartmill, a writer who produced a story about atomic bombs in early 1945, is a case in point. He was investigated by the FBI because his story had things in it which they thought showed he’d had access to highly secret sources about a project which was then supposedly of the highest secrecy. He was able to show them the popular science articles from the preceding five years which allowed him to (surprisingly accurately) extrapolate what an atomic weapon would look like and how it might work. More to my theme, in 1946 a writer called Murray Leinster wrote a story, A Logic named Joe, which anticipated desktop computers (‘logics’) connected together by something that looked very much like the Internet.

On the other hand, the comparatively recent growth of mobile telephony and the Internet took a lot of writers by surprise. Some stories written twenty years ago, looking ahead to twenty years’ time (i.e. today) get the technology very wrong. If the writer is lucky, their plotting and style will allow the reader to overcome that. Sometimes, the situations in the story will be sufficiently analogous so as not to make any difference. In Connie Willis’ Doomsday Book, written in 1992 and set sometime in the 2020s, something goes wrong with a time-travel project at Oxford University, but a senior decision-maker is on a fishing trip in Scotland and cannot be contacted. At the same time, pandemic flu breaks out and Oxford goes into lockdown. Willis put no mobile phones into the novel; but in the event of a major lockdown in a British city, trying to contact a colleague who has gone way off the grid could well be just as difficult whether your phones relied on copper cables or packet radio.

But then we look at David Brin’s otherwise excellent end-of-the-world novel The Forge of God. Written in 1987, it put the action of the story in 1997-98. But there is no Internet in the story, which seeing as a number of the characters are academics or in the military is rather a glaring omission. His characters appear to be able to access a few  bulletin boards and that’s all. At one point, to preserve the achievements of human civilisation, characters have to go out and buy hard copy or diskette versions of encyclopaedias and major institutional libraries – which, even by the standards of the ‘real’ late 1990s looked distinctly archaic. It wasn’t Brin’s fault; he wasn’t a technologist and in 1987 no-one anticipated just how fast the world wide web would take off.

And just look at the computer interfaces in two highly popular films I re-watched recently, Alien and its sequel, Aliens. We have starships, hypersleep and colonies on distant worlds; yet the IT is little advanced from Windows 3.1.

I’ve written in an earlier blog about the value of science fiction as a literature that engages some of the same intellectual parts of our minds as testing. But the intrepid reader should be prepared for some IT that may challenge your suspension of disbelief – unless you take careful note of the date of first publication, to be found inside the front cover.

But don’t let that put you off.

Thinking, Fast and Slow by Daniel Kahneman

(This review is cross-posted from my book review blog, Deep Waters Reading.)

I bought a copy of this book on the recommendation of a colleague in the software testing community, and I struggled with it until bailing out at around the 65% mark. The basic premise, that we have a bicameral mind with two different ways of thinking, and that we rely on the first way of thinking most of the time, which is fine when it’s right but not so good when it’s wrong, is important and needs saying. So much of our thinking about things that are complicated seems severely influenced by what Kahneman describes as System 1 thinking, which jumps to conclusions and takes the easy way out. We certainly seem to be living in a System 1 world right now.

Kahneman then goes on to describe all the different sorts of biases that can fool us. This is an important area for software testers in particular, because these biases influence the way we look at software applications under test, and the assumptions we make about how a particular application works, ought to work, or how it will be used by people out in the Real World who just want to open the software and use it without any further thought or preparation, like any other simple tool, from the stone axe onwards. However, computer software is just that bit more complex than the stone axe, and that’s where the problems start.

So far for the book, so good. But I started running into problems with it from the outset. I rapidly came to the conclusion that someone, most likely the publisher, dumbed it down. (It took me a few days to discover the notes at the back because someone decided that it would be better to take all the referencing out of the text – but without the referencing, the book often reads to me like pseudo-science because of the way Kahneman keeps saying “Studies have shown…” or “Scientists in San Francisco found…”; without knowing that there actually is a solid, valid reference behind these statements, they look like the sorts of things pseudo-scientists say to “prove” that you can extract sunbeams from cucumbers).

I did find the text rather old-fashioned; it read like a 1970s psychology textbook,. and indeed that’s when Kahneman and his collaborator Amos Tversky did a lot of their initial work. In any case, I got as far as Chapter 16 and then seriously considered abandoning the book. But I rested it for a few days and then went back to it, which seemed to coincide with what looked like a change in direction in the text, to a more anecdotal style. But that was something of a false dawn, because Kahneman then dived into analyses of risk and gaming, and we ended up with a series of examples of questions like “Would you rather have a 50% chance of winning $50 and a 10% chance of having $10 taken away from you, or a 60% chance of winning $10 and a 35% chance of winning $85?” and after the fourth or fifth example of that – which seemed to occupy much of the rest of the book – I gave up. This is not something I regularly do.

Partly, I suspect I may not be the book’s intended audience; I found myself challenging too many of his examples and I saw through the perspective exercise in chapter 9 (Figure 9 – page 100 in my UK paperback edition) and was then amused to see that the author recognises that “experienced photographers have the skills of seeing the drawing as an object…” and I am such a photographer!

Or perhaps I was applying my tester’s mindset to the problems, which may be over-thinking them, trying to find real-world solutions instead of just letting my own Systems 1 and 2 battle it out between them.

I did find the text excessively US-centric, to the point where I complained loudly over one question that was put as an example quite early on: “How many murders are committed in the state of Michigan?”, to which I replied “No idea – I’d usually Google that one.” Well, of course, the question ought to have been “What is your estimate of the annual number of murders in the state of Michigan?”; but that aside, then Kahneman saying “Well, of course you only thought of Michigan and forgot that Detroit is in Michigan and so has its tremendous number of murders counted in the state-wide total” just struck me as the sort of geographical bias we try to eliminate when doing testing work, and indeed seemed to expose the very biases he went on to discuss later on in the book. There are other examples but this was the worst.

But Chapters 14 and 15 brought me to a juddering halt with two examples of what testers call “testing personas”, invented characters who are used to represent typical real-world users. The first, “Tom W”, is based around a set of assumptions about IT systems developers from the 1970s! I’ve worked in IT for twenty-five years, and the sort of stereotyping that Kahneman bases his expectations on died out long ago, certainly in the organisations and companies I’ve worked in. And then we had the “Linda problem”. Kahneman set up a fictional character, Linda, with a fairly detailed backstory and life circumstances, but then when people say “Yes, Linda could be a feminist bank teller”, he says that is the wrong answer! I’m sorry, I got angry with him at that point. I’ve met plenty of Lindas (of both genders) with strongly-held political beliefs that drive their existence, and they are quite capable of holding down comparatively menial jobs. If anything, their political beliefs support them in their jobs and give them a focus outside of those jobs that helps them cope. It was at this point that I realised that Kahneman was applying economic criteria to his cases. Thinking of the probabilities of the quantum of feminist bank tellers as a proportion of all bank tellers, the likelihood of Linda being a feminist bank teller was arrived at statistically. Yet Kahneman created her with a backstory where those feminist values would be sufficiently important to her for her to hold to her feminism – in the real world. Kahneman’s explanation at the end of chapter 15, that the sort of objections I raise actually aren’t relevant to the argument he’s trying to make, just irritated me more. Are we supposed to be reading this book to find out interesting facts about human nature when working with human beings and their artifacts, or just to admire how clever Daniel Kahneman is?

(His ‘less is more’ example in that chapter – can you charge more for a tea service with fewer pieces, all of which are perfect, or for one with more pieces, a significant number of which are imperfect – made me smile because if the author had had much experience of actually selling things in sets, he would have realised that a smaller but complete set is worth more than a larger, but incomplete and/or flawed set, especially to more discerning customers.)

Perhaps I was reading it too fast. One of the blurbs on the back of the UK edition says “Buy it fast. Read it slowly.”; and indeed the person who recommended it to me in the first place stretched his reading of it out over a series of weeks. As I said, perhaps i’m not the book’s intended audience. Perhaps I’ll just take it to the office and see if any of my other testing colleagues want to have a try at it.

Postscript

In Chapter 34, Kahnemann refers to ‘nudge theory’, as described in the 2008 book by Richard Thaler and Cass Sunstein, “Nudge: Improving Decisions about Health, Wealth, and Happiness“. (Kahnemann and Thaler worked together at Stanford University in 1977-78.) Thaler is generally credited with creating nudge theory. Yet I have recently uncovered evidednce that the idea originated a long way away and much earlier.

I’ve recently been reading a collection of the amateur writings by an Irish science fiction fan, Walt Willis. His day job was as a senior civil servant in the devolved Northern Irish  Stormont parliament. In 1969, Willis described an idea he’d had:

“...the principle is one I follow in my day-to-day work as a senior civil servant concerned with the problem of making people behave better. Privately I think of it as Nudgism, the theory that people can be induced voluntarily to do things you couldn’t force them to do.

Willis only ever described his idea once in print, in a limited-circulation small press fanzine (Warhoon 26, February 1969). I’ve seen no evidence that either Thaler or Sunstein ever saw that, and at this stage it would probably be impossible to access what circulation records there might be (if, indeed, any ever existed) to see how the idea got passed on. I suspect it submerged into the collective unconscious and only surfaced some 35-40 years later in a conversation with one of the authors that went “I read somewhere about an idea…”

Automate some of the things

Every so often, discussions in the online testing community turns to the “manual testing is dead” theme. Another iteration of this was kicked off earlier this week in the Ministry of Testing discussion group (https://club.ministryoftesting.com/t/automation-is-completely-taking-away-manual-qa/29150/5). I tried to ignore that inflammatory thread title for as long as I could without replying. I lasted about two days.

Ultimately, using automated testing will deliver applications that work perfectly, but only as long as the users know how to use them. A user coming to the app cold, either without training (if the app is a business-related piece of software that demands it), or as a first-time user from the general population who will expect to be guided through the app intuitively, or with detailed instructions, may find the app unhelpful at best, and unusable at worst.

A couple of years ago, I encountered a supermarket customer service terminal that I gave up on in disgust. First it needed to read my loyalty card – but the reader was broken. So I had to enter my address manually. But instead of entering the (UK) postcode first and first line of the address second (thus drilling down from an area to an individual address*), it asked for first line of address first and then postcode second. (So this was validating the address with the postcode rather than using the postcode to validate the address.) But my address appears on different databases formatted differently, and one of them is “wrong” (not the address I was given when I moved into the property). So I entered first line of address and postcode, only to be constantly told “No such address at that postcode”.

(*UK postcodes have sufficient granularity to cover at the most some fifty different property addresses. In rural areas, some postcodes may only cover a handful of properties, or possibly even only one.)

After giving up on this terminal the first time, I referred to some correspondence from the supermarket to see which version of my address they were using. Armed with this information, I went back and tried again.

Although this was run on a touchscreen terminal, with an onscreen keyboard, you were given two fields at the top of the page, and a “Next” button, in the form of an arrow to the right at the bottom right of the page, like so:

Bad UI

The insert point was already in the first field when the screen was displayed.

So with a touchscreen, you’d expect to touch field 1 to enter text, then touch field 2 to enter text, and then go to the bottom of the screen to touch ‘Next’ and go to the next page.

Except you didn’t. You actually had to use the ‘Next’ button to navigate from field 1 to field 2, even though it didn’t look as if that was the correct workflow. After three different attempts to use this terminal on three different visits, and encountering other problems on subsequent pages that meant I was unable to complete the workflow, I gave up and wrote to the supermarket’s CEO to complain. It was clear to me that this app had been through automatic testing, because everything worked the way it was designed to. But it was also clear that the app had never been subject to any manual testing, as that would have soon shown that the app was totally useless to even an above-average user (he said, modestly) unless you’d read the specification document!

(Those of us who work in IT, as I assume most readers of this blog do, are in only about the top 20% of the population who use computers of any sort on a regular basis. If a system that we come across in the real world flummoxes us, then ordinary members of the public have little chance.)

When I wrote to the CEO, I pointed out what my Day Job was, and hinted what my usual invoice for that sort of report would be. :slight_smile: When they wrote back, the CEO admitted that the app was due to be replaced within the next three months but didn’t offer me any money, not even a supermarket voucher. I’ve not tried its replacement yet.

Automation is great for checking, as a previous poster said. But any app that is only subject to automated testing is going to fail sooner or later when faced with real world users in real world situations. And to prevent that, you need “manual” testers. Or you need to be able to explain to your CEO why they are getting snotty letters – or even invoices! – from members of the public.

Reading for the curious

In about a week’s time, I shall be leaving my place of toil and heading off to the 77th World Science Fiction Convention in Dublin, where I expect to attend various panels on current trends in science fiction, current themes in the sciences that stand behind my favourite literature, and most probably hobnobbing with writers, publishers and agents. There will be some sightseeing, and possibly a Dublin bookshop crawl. Drink may also be taken.

Why do I mention this? Well, there has been a thread in the Ministry of Testing discussion groups that asks “What non testing book are you reading (or have read) that influence your testing? “, and a couple of posters have admitted a liking for detective novels, or courtroom dramas. I added that I think it went deeper or further than that. I think that enjoying any sort of story with a ‘puzzle’ in it – a detective story, a mystery, or some of the older sorts of science fiction – is perhaps an indicator of the sort of mindset that indicates someone who will be an effective tester.

Science fiction is a highly misunderstood genre. Often, people who don’t know it will say “You can do anything you like – it’s sci-fi!” (Note: “sci-fi” is a term generally derided by proper fans, who consider that it is usually an indicator of bad science fiction. The preferred short term is “SF”.) Or they will accuse SF of being mere escapism, or conflate it with fantasy. But the point about science fiction is that there is science in it; not in the sense of “Tell me, Professor, what makes your spaceship go so fast?” (another trope of “sci-fi” according to my earlier definition), but rather in the way that there is consistency in the way that events in a science fiction story either adhere to the known laws of the universe, or are consistent with the defined rules of the universe that the story takes place in. This is generally taken to be the thing that separates science fiction from fantasy. (We shall ignore the sub-genre of ‘science fantasy’ here; that has been the subject of a lot of debate over the years, and is indeed the sort of thing that I might anticipate having heated discussions about over a few Guinnesses next week.) (Other beverages are available.)

So we see stories where the focus is “why does this species behave this way?”, “how do we survive on this planet?” or “how do we get ourselves out of this situation?” and where the solutions depend on the accurate application of rational thinking within a framework of defined physical laws. These, along with detective or mystery stories, are the sort of stories that I suggest may be attractive to people with a tester’s mindset; and indeed, the sort of people I know through the science fiction community often have uncannily similar tastes as to the sort of literature they enjoy, embracing those other genres. It may even be an identifier of someone who would make a good tester even if they aren’t currently working within the discipline.

I described this sort of SF story as “the older sort of science fiction” because, starting in the middle 1960s, a new style of SF arose which looked more at how individuals reacted to fantastic or futuristic scenarios and how those scenarios affected their lives, personalities or states of mind. This was called “the New Wave” and, as with any new approach, it caused controversy in some quarters. That’s now fifty years in the past, and the sort of SF being written now aims to combine the analytical approach to world-building with the personal reactions of well-defined and well-rounded characters. The overall effect of that has been to produce stories which can stand on their own as well-crafted pieces of literature but which still possess the “sense of wonder” that writers and editors were seeking to evoke back in the 1930s and 1940s, a period now known as the “Golden Age” of science fiction, though many say that the “Golden Age” of anything is when you were fourteen, and many critics and commentators use “Golden Age SF” as a term of derision.

Nonetheless, the connection remains true. Many testers I know have an interest in the literature of the fantastic, and there are still stories that present the reader with an intellectual challenge to figure out what exactly is happening and why. And I still treasure attending a talk by the theoretical physicist Michio Kaku at the Hay Literary Festival a few years back. Hay is usually the preserve of the literary mainstream, the sort of audience who still believes some of the misapprehensions about SF that I mentioned earlier. Kaku has no such qualms. He started his talk by saying “How many of you here read science fiction?” About a third of the hands in the tent – out of a total audience of some thousand people – went up.

“Great!” he enthused. “The rest of you – get with the programme!!!”

“I’ve seen better-dressed wounds!”

A little while ago, I heard (for the second time) a segment of the BBC Radio 4 show Women’s Hour which talked about dress codes at work. It made me think about my own  experience.

I am a bit out of the ordinary for both my company and indeed the whole sector. I go to work in a suit. Up until my last job, about seven years ago, I would also have worn a tie. Why is this?

Well, for one thing, my parents were that much older than those of my contemporaries when I was born, in 1957. So I grew up in a family where the expectation was that men would go to work in formal work attire, and casual clothes were something that really didn’t exist, not in my parents’ world of the 1930s and 40s when they grew up. (The fact that my father had been an Army drill sergeant might also have had something to do with it.)

Then I went to work in a very traditionalist organisation, the British Civil Service. For the first ten years, I was public-facing and so collar and tie was required. I then moved to the water regulator, Ofwat, and for the first five years I was effectively in the Director General’s outer office. We could have FTSE100 company chairman passing through, Ministers of the British or overseas governments, or any other sort of VIP. And as I was located in the press office, there was an outside chance that I might – if everyone else had been out of the office or otherwise engaged – have had to do broadcast media interviews. (I was a long way down the pecking order, so my appearance on tv would have been a sign that something really, really had gone south in a Big Way, but it was nonetheless a possibility. )

For the next fifteen years, although I was not so directly exposed, my work was nonetheless quite important to the organisation and so there was always the possiblity that VIPs might be brought round to be shown the leading edge troops, shovelling data into the digital furnaces down in Ofwat’s engine room. And I also had to liase and sometimes visit or recieve senior people from the water or civil engineering industries. So collar and tie remained the order of the day.

(Meanwhile, in my former employer, the social security department, there had been a change after one chap whose job was 100% behind the scenes was disciplined for not observing the dress code. This escalated into a full-blown Employment Tribunal on sex discrimination grounds, as women in the same job were not required to observe any particular dress code. Before this was settled – in the worker’s favour – there was the spectacle of dozens of junior civil servants suddenly discovering a previously-unknown Scots heritage and turning up to the office in kilts, as ethnic dress was exempt from departmental dress codes. Yes, it all got rather silly.)

After I left Ofwat, I was freelancing and so, from time to time, had to walk the walk as well as talk the talk; on a couple of occasions, turning up to a consultancy job in a silver Mercedes and showing up in the corporate conference room suited and booted got me not only listened to respectfully but also paid on time!

As I said earlier, I lost the tie in my previous job, as there was a CEO there who wore a suit but an open-necked shirt. He was quite innovative in other ways; that company ran a call centre for taking facilities calls from shops and offices up and down the country, and finding plumbers or electricians or maintainance engineers local to the caller to go out and address these jobs. The CEO took a rather egalitarian attitude to his work, and once a month would block out one morning so he could go and sit on the call centre, not just to be seen at the workface, but actually to put on a headset and take calls. This certainly endeared him to me; sadly, a lot of the Board and the company’s owners weren’t so enlightened and in due course, he went, not long before I did.

When I took up that job, I had to move to get an easier commute; and some of my belongings had to go into storage. I took on a storage unit on a farm not far from my former home, and when I went to see the unit and shake hands on the agreement, I noticed that the farmer and his son-in-law were calling me “sir”. I’d gone straight there from the office; so I said “Before we go any further, let’s get one thing straight and drop this “Sir” business. I’m in my working clothes, you’re in yours, end of. OK?” And it was.

I now work in IT, where wearing a suit in a back-office role is distinctly eccentric. (We recently had a client user consultation group meeting, where our dress code was ‘smart casual’. “Do I have to dress down, then?” was my question.) Fortunately, a number of my colleagues, like me, have memories of the tv science fiction show Stargate: Atlantis, which oddly enough has a bearing on this. The premise of the show is that the US Air Force has come into possession of an ancient alien transport device, a ‘Stargate’, which enables easy travel to other planets, and even galaxies. In due course, the lost city of Atlantis is discovered in a distant galaxy and an expedition of soldiers, scientists and archaeologists is sent to explore it; and that expedition is led by a civilian. Towards the end of the show’s run, the civilian administrator of the expedition is replaced by a Washington lawyer, Mr. Wolseley, played by the under-rated Robert Picardo. And although whilst on duty, Wolseley wears the same uniform coveralls as other civilian staff on the expedition, when he goes off-duty, he relaxes by putting on a suit and tie.

The reason for this is, he explains, that he had his best moments as a Washington lawyer in the courtroom. That is who he is, and so that is how he feels most comfortable. And so he reflects his personality and his personal history in his clothing of choice. I identify with that character very closely! And so I wear a suit to work, because that’s how I know I’m at work, and to me it says that I take my work seriously. I’m not saying that my more casually-attired colleagues don’t take their work seriously – they do – but this is how I show it. It’s who I am.

The value of soft skills

I don’t follow football much, but I grew up in Derby in the 1970s, when Derby County were managed by the infamous Brian Clough. It was impossible not to follow Cloughie’s career, so I watched his 44 days at Leeds and then his move to Nottingham Forest.

I remember the 1991 FA Cup Final between Forest and Spurs. It was one all after 90 minutes and so they went into extra time. After 15 minutes the score was still level. During the break, Spurs went into a huddle, with their manager talking to the players, encouraging them, inspiring them and generally reinforcing their team spirit. Clough, however, just sat in the dugout with his arms folded and a face like thunder. The Forest team milled about aimlessly on the touchline and had no words of encouragement.

When play resumed, Nottingham Forest conceded an own goal and that was how the match ended, Spurs winning 2-1. Clough only lasted another two years in the job and then retired from football management altogether.

Out of the few football matches I’ve ever watched, this one has stuck in my mind precisely because of the illustration it makes of the value of soft skills. Brian Clough was an excellent technical manager but had no soft skills whatsoever; and when it was most important, their absence swung the match – and the Cup – without the other side having to do a thing apart from hold the game together.

 

99 Second Talk
Midlands Testers, 17th April 2019 (Photo: Ben Fellows)