IT systems and (quite possibly) testing issues have been at the centre of a lot of news in the UK in recent weeks. TSB (Trustee Savings Bank) suffered a major outage of their online and mobile banking apps following a systems upgrade during April. Systems became inaccessible; some customers could not see transactions reflected in their accounts; others saw other customers’ transactions showing up in their accounts. The bank’s CEO had to make media appearances to apologise and promise action; and the bank drafted in “experts” from IBM to try to put things right. (I’m not casting doubts on the expertise of the consultants here; “experts” was the word used in the media to describe the consultants, but in terms of understanding what actions need to be taken, it’s a pretty meaningless word. We are all “experts” of one sort or another here.)
UK banks are pretty heavily regulated these days, though that regulation is a fairly imperfect thing. I last wrote on regulation and banking in November 2015 in my mainstream blog Steer for the deep waters only (Is He One of Us?); since then, I’ve had a little more insight into High Street banking and testing.
Let’s make on thing clear; I’m expressing my own opinion as a tester. I do not and have never worked for TSB; indeed, if I ever had, I really wouldn’t be able to comment because of simple issues of employer confidentiality. But I can make some informed guesses as to what has happened, based on my knowledge of projects I’ve worked on that have involved integrating new applications with legacy systems, on load testing for new Cloud-based applications and also on my experience working on one project where a UK High Street bank was the ultimate client.
For non-UK readers, retail banking in Britain is mainly in the hands of four major clearing banks – Barclays, HSBC, Lloyds and NatWest. They operate national networks of branches, hold shares in major credit card companies, and participate in a national network such that British customers enjoy highly available access to their bank accounts from almost anywhere in the country (even before the era of mobile and internet banking). Working together, they underwrite the companies that run inter-bank clearances. But for a long time, they were considered to be unresponsive to customer concerns or needs; with that in mind, previous UK Governments took steps to change the legal landscape to allow new competitors to emerge. The tendency of the big four banks to absorb smaller banks was countered in the 1990s and 2000s by changes which allowed the building societies (long-established mutual bodies who mainly dealt in savings products and mortgages for house loans) to expand their operations to embrace retail banking and (if they wished) to de-mutualise and become banks in all but name. The building society sector itself consolidated, so that what used to be a sector with a large number of local mutual societies is now more a second tier of retail banking, with a number of regional societies giving customers access to a similar level of service that they might expect from a bank.
TSB was created by the amalgamation of local and regional savings banks over a period of years starting in 1967, although some of its constituents date back as far as 1810. By 1975-6, the bank existed in a form recognisable today as a fully-fledged retail bank, offering a similar range of services to the big four clearing banks. It merged with Lloyds in 1995, creating what was then the largest retail bank in the UK. The separate identity of TSB was submerged within the Lloyds operation, although the name remained on the new bank’s masthead. Even that disappeared when Lloyds acquired HBOS (Halifax/Bank of Scotland, itself a merged entity formed from the former Halifax Building Society and the Bank of Scotland) in 2009, when the new business became known simply as Lloyds Banking Group. But that was just a short time before the global banking crisis of the same year; the UK government bailed out the High Street banks, effectively nationalising those that were most heavily exposed, in order to prevent the catastrophic collapse of the entire retail banking system. In order to comply with EU rules preventing state aid subsidy, Lloyds announced that it would spin off a part of its business as a re-launched TSB. The new bank began operations in 2013; but separating systems and customers has been a longer drawn out business. The Spanish bank Sabadell acquired TSB in 2015, and proposed moving their UK customer base from Lloyds legacy systems to a UK-based replica of their own customer platform, Proteo, with a target go-live date of the end of 2017. In fact, it seems that the programme ran late, and customer records did not begin to be migrated to the new platform until April this year. Users of the online and mobile phone banking systems found their accounts to be unavailable for at least a week, and other customers reported being able to access account details of other customers.
The banks were early adopters of IT, and by “early” I mean that their original systems were – and are – prehistoric in terms of the sort of technical churn we now think of as normal in the sector. Cheque clearing was automated in the 1960s, using big mainframes at regional data centres, with applications written in languages such as FORTRAN or COBOL. And this is part of the problem; their systems are so large and extensive that it is extremely difficult for them to be upgraded, both in terms of the development cost and the sheer logistics of swapping out one system for another, seamlessly and across an entire country with millions of concurrent users hitting the systems day and night. Over the years, new developments have been added to the existing systems; so no matter how modern a bank’s latest website might look, drill down through the layers and you will find something designed and first implemented up to fifty years ago.
This, of course, has its own problems; the developers who wrote the original banking systems are now at best retired, so there is a serious knowledge transfer issue across the industry. And adding new applications or functionality to any existing system, let alone one that may be twenty, thirty or more years old, has its own problems. Middleware has to talk accurately to both upstream and downstream applications; I once had a contract with a high-profile professional organisation that was doing in-depth end-to-end testing on a membership renewal website upgrade that had to interact with a customer relationship management app that was about five years old and an accountancy package that was probably fifteen years old. I joined a team of testers that had been in place for some time; I spent four months there, and it was only because of a set of business decisions connected with other things happening in the organisation that a line was drawn under the e2e testing as being “good enough”. In a later role, an application declared by the company’s CEO to be the “best tested ever” fell over when deployed because the specification hadn’t looked at the data items it was required to process from upstream apps and the data formats it was required to hand off downstream. As the downstream app was the invoicing and payments one, failure here meant that transactions didn’t get processed. At one stage, there were transactions worth about £25k per day getting logjammed because no-one had considered what actual format the data needed to be in before it was handed off for payment.
The people who were responsible for that were consultants who had been given a very specific brief when the system specification had been drawn up eighteen months earlier; and the person in the company who had engaged them had long since left. But the flak from that came back to me as a tester and the BA who managed the process.
Coming back to the recent problems, I wonder how much data cleansing TSB did before their system went into test? Was their test dataset properly representative of the range of customers, their identities and the sort of transactions they wanted to carry out? Or did they just use a vanilla dataset that they knew would work and return acceptable results in a limited timeframe? When you’re up against a deadline, the temptation to use such a vanilla dataset and a check for the happy path only is pretty big – but should be resisted as far as possible.
Even if you consider all these issues, there is then the problem of scalability. It’s one thing to test with a dataset of perhaps five hundred users hitting the system consecutively; it’s quite another to apply the same test for five million concurrent users.
I have had one experience working on a banking system, and that was quite an eye-opener. In 2014 I went as a sub-contractor to a third–party testing provider, positioned within an outsourced services company. Their client was a High Street bank that they provided back-office data and records processing services for, mostly using mature proprietary applications from big-label IT companies. The project was to implement a minor change in the way account changes (authorised signatories, official addresses) were notified and actioned on certain accounts and it involved the completion of a paper form in the bank branch, which was then scanned, sent to the back-office company, who then processed it as an image, validated it and did the OCR work to implement the change on the bank’s own system and generate the necessary hard copy documentation confirming the change.
The outsourcing company I was placed with was a US-based multinational. My first problem was my own status. As a contractor, I found that I was subject to all sorts of restrictions, starting with not being allowed to use the front door. Whilst this might seem quite reasonable in the case of rufty tufty blokes in hard hats and hi-vis jerkins treading cement dust into the carpet in reception, this hardly seemed appropriate for an IT professional. Still: their office, their rules. But then, it took a week for me to be authorised to even have access to the IT network, as my credentials had to be approved by head office IT admins in the States. Even that wasn’t the end of it. As the company was dealing with banking issues, my ID swipe card had to be collected from the security office at the main gate on arrival and returned there when I left each night. I also had to have appropriate permissions to move around the building, including one set of permissions to visit the developers and a separate set of permissions to access the shop floor where the document processing was actually done.
Even then, the impediments didn’t stop. There was no test environment; all my testing had to be done in the live environment after business hours (transaction processing ceased at 3:30pm). This meant that I ended up doing pair testing with the lead developer on this change from about 3:30 to 7:30 or 8pm, running test documentation that we arranged to be sent by a nominated bank branch on an end-to-end test to ensure that installation and functionality worked in the live environment. This wouldn’t have been so bad, but I was providing holiday cover, so the test manager who could have authorised variations to the terms of engagement was one of the people I was substituting for; and the terms of the contract were that I had to attend during normal business hours, starting at 9am regardless. My contract was that I was paid by the day, not by the hour, and there was no-one who could authorise a local variation so that I could have a more reasonable starting time or could be paid for the hours I actually put in. Luckily for them, I consider myself to be a conscientious professional who does what’s needed to get the job done.
And this was a simple addition of one new form to an existing system that would be actioned at one location. A major migration of millions of customers from one system to another, on a completely new platform but which has to merge seamlessly with legacy systems on a national basis was never going to be an easy task. That TSB had to “bring in experts from IBM” a week into the crisis suggests that there was a mindset in the bank of considering testers to be mere functionaries, people who stepped through a simple test script to ensure that system functionality did what it was supposed to. The wider role of the tester, in determining how the system should be tested, what would be required to properly test the system, what resources and how much time these tests would require, does not appear to have been taken into account. No-one appears to have done any sort of risk analysis (which boils down to the simple question: “What’s the worst thing that could happen?”). Testing seems to have been restricted to a pure quality control process, a simple check that the application does what it says on the tin.
More recently, another problem has emerged with the National Health Service (NHS) breast cancer screening programme. Women in England between the ages of 50 and 70 are automatically invited for breast cancer screening every three years. They should receive their final invitation between their 68th and before their 71st birthdays.
But in 2009, the system was changed to allow trials to take place over extending the age range of those invited for screening. As a consequence, women who had already reached their 70th birthday were excluded from the system. Up to 450,000 women never received an invitation for their final scan. Only when Public Health England set a systems upgrade in place ten years later was the error discovered.
An official enquiry is now under way to determine how this omission occurred; as testers, our concerns should be that a lack of feedback loops in the business end of the process – issuing invitations, booking places and keeping patients’ own GPs informed, processes that would act as checks on the whole screening programme – will be overlooked and blame laid wholly on the shoulders of testers who an ignorant management may very well expect, wholly unreasonably, to detect every single possible bug in the system. Testers do not guarantee 100% bug-free software; and sometimes a system can work exactly the way the code says it should, but there is a gap or a shortfall in the understanding of the person who specified the system, or the person or body who issued the requirement for the system.
Of course, this does also raise a question of gender bias. The bulk of developers are men; not themselves being subject to breast screening, it might be argued that their grasp of the realities and implications of this test was less immediate than if there had been more women involved in the software development. Would there have been a different outcome if the programme had been screening for, say, prostate cancer? Or would other biases, such as age, have taken hold instead? Or should we expect, and even publicise, the fact that anything made by humans can have errors in it? Accepting that would mean that testers would cease being seen as gatekeepers, with a burden of responsibility for the consequences of bugs being undetected; rather, they should be seen as explorers, finding the limits of what a piece of software can do, managing expectations and helping mitigate risk. In a world becoming daily more dependent on the complex, intangible constructs that are software applications, some recognition of the enormity and difficulty of the task of building, testing, deploying and maintaining software tools is very much overdue. But I’m not holding my breath for that anytime soon.