Author Topic: The BA IT collapse  (Read 10781 times)

The BA IT collapse
« on: May 27, 2017, 05:38:02 pm »
http://www.bbc.com/news/uk-40069865

Quote
British Airways has cancelled all flights from Heathrow and Gatwick because of global computer problems.

It apologised for the "global system outage" and said there was no evidence of a cyber attack.

Mick Rix, GMB's national officer for aviation said: "This could have all been avoided.
"BA in 2016 made hundreds of dedicated and loyal IT staff redundant and outsourced the work to India... many viewed the company's actions as just plain greedy."

My employer took the same course of outsourcing in 2005.  It was not successful because the companies doing the work were also working for other clients and did not give priority to one client.  I understand that much of the work has returned to be done by UK employees.

Whatever the cause of this particular incident, I'd suggest that such critical operations should be maintained by dedicated staff working for the company.  Thus a high degree of urgency could be assured.
Sic transit and all that..

Kim

  • Timelord
Re: The BA IT collapse
« Reply #1 on: May 27, 2017, 05:40:27 pm »
Cynically, it seems like a good excuse to ground all your planes in a hurry without causing a mass panic...
Careful, Kim. Your sarcasm's showing...

Basil

  • Um....err......oh bugger!
  • Help me!
Re: The BA IT collapse
« Reply #2 on: May 27, 2017, 05:48:15 pm »
Cynically, it seems like a good excuse to ground all your planes in a hurry without causing a mass panic...

Blimey.  I hadn't thought of that.
Quote from: Kim
And remember that friends who organise things on Facebook aren't proper friends anyway.

Re: The BA IT collapse
« Reply #3 on: May 27, 2017, 06:00:25 pm »
Cynically, it seems like a good excuse to ground all your planes in a hurry without causing a mass panic...

That's the scary version :o

If so, when will it be safe to explain?
Sic transit and all that..

ElyDave

  • Royal and Ancient Polar Bear Society member 263583
Re: The BA IT collapse
« Reply #4 on: May 27, 2017, 06:26:41 pm »
But for what reason?

If there was some kind of generalised threat, why only BA?
“Procrastination is the thief of time, collar him.” –Charles Dickens

Re: The BA IT collapse
« Reply #5 on: May 27, 2017, 06:40:34 pm »
Call to radio or tv station:

Hello, this is <codename>

There is a device on a British Airways plane.

Terrorists use code names and code words to get across their warnings.   I recall this from the days when the IRA was active.

Re: The BA IT collapse
« Reply #6 on: May 27, 2017, 06:47:35 pm »
Call to radio or tv station:

Hello, this is <codename>

There is a device on a British Airways plane.

If that was really the case then (a) passengers would not have been kept on board flights at the terminal, as has happened to many people this afternoon, and (b) inbound BA flights would not be landing at Heathrow but would be diverted, probably to Stansted.

There seems little doubt it is an IT issue, as reported.

Re: The BA IT collapse
« Reply #7 on: May 27, 2017, 06:55:38 pm »
Call to radio or tv station:

Hello, this is <codename>

There is a device on a British Airways plane.

Terrorists use code names and code words to get across their warnings.   I recall this from the days when the IRA was active.

And those days have gone as the aim these days is to kill and get maximum casualty rate, unlike the IRA who were keen to disrupt infrastructure and cause general damage to buildings.  Followers of IS do not use code words and the asymmetric warfare they pursue would not benefit from code words and warnings.

That said, a vicarious phone call stating a threat could cause the closure or withdrawal of facilities. Big decision on behalf of someone to ignore such a threat.

Re: The BA IT collapse
« Reply #8 on: May 27, 2017, 06:57:16 pm »
My cross-post with DaveR and totally agree with his comments.

PaulF

  • "World's Scariest Barman"
  • It's only impossible if you stop to think about it
Re: The BA IT collapse
« Reply #9 on: May 27, 2017, 06:57:32 pm »
Call to radio or tv station:

Hello, this is <codename>

There is a device on a British Airways plane.

Terrorists use code names and code words to get across their warnings.   I recall this from the days when the IRA was active.

Think that was just the IRA, the current crop of terrorists don't seem to be doing the same. Don't think it's a good idea to generalise about terrorists. Or any other group for that matter.

Kim

  • Timelord
Re: The BA IT collapse
« Reply #10 on: May 27, 2017, 07:00:00 pm »
...Or it could just be an infrastructure clusterfuck.  It is a bank holiday weekend, after all.
Careful, Kim. Your sarcasm's showing...

Re: The BA IT collapse
« Reply #11 on: May 27, 2017, 07:02:34 pm »
You should lay off the weed, Kim.


Re: The BA IT collapse
« Reply #12 on: May 27, 2017, 07:14:45 pm »
Call to radio or tv station:

Hello, this is <codename>

There is a device on a British Airways plane.

Terrorists use code names and code words to get across their warnings.   I recall this from the days when the IRA was active.

Think that was just the IRA, the current crop of terrorists don't seem to be doing the same. Don't think it's a good idea to generalise about terrorists. Or any other group for that matter.

I'm not generalising, just offering a possible explanation about behaviour.   We don't know and we're all hypothesising.   Probably more likely imo that BA has been hacked to be honest.

Re: The BA IT collapse
« Reply #13 on: May 27, 2017, 08:02:35 pm »
Quote
We believe the root cause was a power supply issue."

My employer had back up generators in the basement.  Power supplies issues just made the lights flicker.

What is more they had agreements with other major computer 'owners' that in the event of a crash there would be a switch-over to allow continuity. Resources were not isolated but pooled through people like IBM.

My last job was change management i.e. to ensure that no changes were made unless it had been tested through three stages with a proper test plan approved by users.  Once our IT department 'bought into it' it worked very well.  When they decided to outsource I volunteered for redundancy, quite happily, as did many others.  I understand that once outsourced the system did not work well and the IT director was replaced.

The simple explanation for BA's plight is that a similar thing has happened - they failed to understand the downside of outsourcing.
Sic transit and all that..

Re: The BA IT collapse
« Reply #14 on: May 27, 2017, 09:40:17 pm »
Put simply, it mostly comes down to poor risk management.  Or, good risk management if the overall cost of this episode is less than it would cost to prevent, which I doubt. When you have an attitude to risk based around fallacious data and empirical assessment of isolated risks combined with the innate obduracy of inanimate objects, this sort of event is inevitable. The only question is, which organisation does it happen to, this time it was BA. Every major organisation that I've had dealings with for IT systems (and that's a metric fuckload) is using cost as one of the most significant factors in their decision making process, it was not always this way.

TL;DR: Shit happens.

Re: The BA IT collapse
« Reply #15 on: May 27, 2017, 09:50:55 pm »
I was absolutely gobsmacked when I arrived at a large retailer's HQ to discover that they would be out of business in three days if their computers failed.  They had no backups, no disaster recovery plans, no failover sites, nothing.  When I left 3 1/2 years later, I left as a result of the IT director* rejecting the proposals to implement the basics.   

When I say left, I was made redundant. 

*  He is a Chelski fan.  I wonder if I still have his number ...

Thor

  • Super-sonnicus idioticus
Re: The BA IT collapse
« Reply #16 on: May 27, 2017, 10:49:23 pm »
The current explanation - that a power failure can knock out one or more critical, global systems for hours - defies credibility. That such systems have no redundancy, failover options, Disaster Recovery procedures, is not credible.

Something's up - and the cover story isn't very convincing.
It was a day like any other in Ireland, only it wasn't raining

Kim

  • Timelord
Re: The BA IT collapse
« Reply #17 on: May 27, 2017, 10:52:40 pm »
Maybe someone opted to save costs my installing the redundant systems in adjacent racks?
Careful, Kim. Your sarcasm's showing...

Vince

  • Can't climb; won't climb
Re: The BA IT collapse
« Reply #18 on: May 28, 2017, 01:50:24 am »
Odd that BA have a system that is specific to flights in and out of Heathrow and Gatwick.
216km from Marsh Gibbon

Re: The BA IT collapse
« Reply #19 on: May 28, 2017, 07:51:06 am »
The current explanation - that a power failure can knock out one or more critical, global systems for hours - defies credibility. That such systems have no effective redundancy, failover options, Disaster Recovery procedures, is not credible.


Afraid to tell you, you are wrong with that assumption. I've done a little FTFY to be a bit more specific.

My job is for %Megacorp, one of the main providers of services to global organisations, my role is intimately involved with understanding any issues and proposing the solution so it would be unethical and inappropriate for me to comment directly; especially given the speculation of where and how the failure occurred. But really, it's no surprise.

One little anecdote about another major org that put in their own dual site HA for some critical systems. Turns out that when one site goes out, it takes out the other. It's been that way for three years and going to be for another one, at least.

Very few organisations understand the difference between HA and DR, even at the highest levels. It's all down to money at the start and end of the day.

Re: The BA IT collapse
« Reply #20 on: May 28, 2017, 07:52:03 am »
The current explanation - that a power failure can knock out one or more critical, global systems for hours - defies credibility. That such systems have no redundancy, failover options, Disaster Recovery procedures, is not credible.

Something's up - and the cover story isn't very convincing.

Yup. I am currently designing the network for an airport and you have two data centres with critical services having a backup server in the second data centre and even within a data centre you have redundant switches in separate racks with everything dual homes to two switches. Each switch has two PSUs (at least) as well and you would normally feed them from separate supplies. The sore and distribution switches for the network that goes out to terminals and connects the baggage handling, check in wireless etc etc would also be redundant. Only the access switches would be a single point of failure and losing one would only take down the stuff directly attached to that.

Mind you I have seen an airport network that was badly designed and a L2 issue like a broadcast storm could have taken the whole thing down.
I think you'll find it's a bit more complicated than that.

Re: The BA IT collapse
« Reply #21 on: May 28, 2017, 08:12:45 am »


Yup. I am currently designing the network .....

And there you have, at least part, of it.

Systems are not just the sum of its components. You can have a great system but, if they aren't designed, installed or maintained properly you have a pile of useless. FTR, my role is the integrator of the various stuffs, like Network, DR, Server, Service, etc etc

Go on, another anecdote (dating back about 15 years). Another major corp with highly redundant highly secure systems lost complete contact with their, erm, contact centre. I happened to be there and got involved. Turned out that the dual redundant inbound circuit terminations when installed had not only been plugged in to the same dual socket, but a wall socket instead of a cabinet socket had been chosen. And someone had plugged in a kettle. That was all it needed.

Mr Larrington

  • A bit ov a lyv wyr by slof standirds
  • Custard Wallah
    • Mr Larrington's Automatic Diary
Re: The BA IT collapse
« Reply #22 on: May 28, 2017, 08:27:01 am »
The current explanation - that a power failure can knock out one or more critical, global systems for hours - defies credibility. That such systems have no redundancy, failover options, Disaster Recovery procedures, is not credible.

Something's up - and the cover story isn't very convincing.

I used to work for an arm of a BigCo in which we had a UPS to keep things going until the backup gennies kicked in.  One day a lightning strike took out half the town's electricity, ours included.  The UPS did its job admirably, the gennies started, all happy.  Then the gennies stopped and the UPS batteries went flat.  Someone had failed to ensure that the gennies had fuel tanks containing diesel rather than fumes...
External Transparent Wall Inspection Operative & Mayor of Mortagne-au-Perche
Satisfying the Bloodlust of the Masses in Peacetime

Re: The BA IT collapse
« Reply #23 on: May 28, 2017, 09:30:59 am »
Go on, another anecdote (dating back about 15 years).

If we're doing anecdotes, I was at a major financial when 9/11 happened. That was when they found out it was a bad idea to have their secondary systems in Manhattan as a backup to the primary, in Manhattan  :facepalm: Learning from this, they moved the secondary to New Jersey. A few years later Hurricane Sandy hit and took them both down.
Quote from: tiermat
that's not science, it's semantics.

Re: The BA IT collapse
« Reply #24 on: May 28, 2017, 10:27:34 am »
Was that the corp with split systems across the two towers, or a different one?