[fc-discuss] Financial Cryptography Update: Learning from Failure

iang@iang.org iang@iang.org
Mon, 4 Jul 2005 21:26:44 +0100 (BST)


((((((((( Financial Cryptography Update: Learning from Failure )))))))))

                             July 04, 2005


------------------------------------------------------------------------

https://www.financialcryptography.com/mt/archives/000496.html



------------------------------------------------------------------------

When we build complex systems, we mean we build systems that are too
complex for any person to understand.  Any one person - it's possible
for one person to understand a module completely, or an overview of all
the components, but not to understand the way the whole thing works.

An inevitable result of this is that complex systems fail in strange
ways.  And it is as perversely inevitable that we often only really
advance our understanding of complex systems in the examination of
failure.  It's our chance to learn how to be more complex.

Here are some notes I've picked up in the last couple of months.

1. Wired reports on how the people stuck in twin towers of the WTC
ignored standard safety rules and also ignored what they were told. 
They used the elevators and stairs and scarpered.

http://www.wired.com/wired/archive/13.06/start.html?pg=3?tw=wn_tophead_
6

The key lesson here is that the people on the scene often have more
information than the experts, in both qualitative and quantitative
terms.	Wired takes this and says "Disobey Authorities" but that's
going to far (http://thurston.halfcat.org/blog/?p=180 Cubicle
comments).  What is more pertinent is that when the information is
clearly better on the ground, then encourage your people to see that
and work with it.  Drop the rules if they see the need, knowing that if
they get that judgement call wrong they will face the music later on.

2. Over in the land of military madness called the Pentagon, they have
just that problem.  The solution - train the corporal to fight the
insurgent on his own terms - seems to be an old one as it was
considered learnt at the end of the Vietnam war by the US Army.  At
least, that was the wisdom I recall from countless military books and
articles.  Read this article for why it has been forgotten.

http://www.chicagotribune.com/technology/chi-0506060166jun06,1,1070200.
story?coll=chi-techtopheds-hed
http://www.informationclearinghouse.info/article9070.htm

I'm not sure what the lesson is here, and indeed, the late John Boyd
had a name for the syndrome.  He called it "stuck in ones own OODA
loop" and said there was no solution.

3. In another episode of safety engineering (seen on TV), the design
and maintenance of the cockpit window in a jetliner came under
question.  At cruising altitude, it popped out, sucking the captain out
and trapping him half in and half out.	Rather uncomfortable at 10,000
metres.

Why this happened was a series of 13 identified failures, every one of
which if it hadn't have happened would have stopped the failure. 
Mostly, the TV program focused on the hapless maintenance engineer who
openly and honestly described how he had followed 'local' procedures
and ended up being an unwitting installer of the wrong bolts.

There are three lessons in this story.

Firstly, the easiest lesson is to make your designs fail safely.  These
days aircraft windows are designed to be fitted from the inside so they
can't pop out under cabin pressure.  That's a fail-safe design.

Secondly, and more subtly, design your safety features to fail
obviously!  13 different failures - yet they all kept going until the
last one failed?  Why wasn't one of these failures noticed earier?

Finally, the subtle lesson here is that local conditions change - you
can write whatever you like in the rule book, and you can set up
whatever you like in the procedures, but if they are things that can be
changed, ignored, bypassed, or twisted, then they will be.  People
optimise, and one of the things the love to optimise away is the rule
book.

4.  In TV documentaries and films, we've all no doubt seen the story of
the O-ring engineer who was brow-beaten into silence before the shuttle
went up.  The safety system was overridden from on-high, because of
commercial interests.  We saw this same pressure a few weekends back in
the farcical United States Grand Prix (Formula 1) race that dropped 14
cars because the tires were declared unsafe.  All the bemoaning of
damage to the sport and lack of compromise misses the key point - the
safety checks are there to stop a wider Challenger-style disaster.

So money matters, and it often overrides simple and obvious safety
issues, because when it doesn't, all monetary hell breaks lose.  We see
this all the time in security and in financial cryptography where basic
flaws are left in place because it costs too much to fix them, and
nobody can show the ROI.

The lesson then is to calculate the damage and make sure you aren't
endangering the system with these flaws.  When I design FC systems I
try and calculate how much would be at risk if a worst-possible but
unlikely crypto break happens, such as a server key compromise.  If I
can keep that cost under 1% of the entire system, by way of
hypothetical example, I declare that "ok".  If I can't, I start
thinking of an additional defence-in-depth that will trigger and save
the day.

It's by no means perfect, and some would criticise that as a
declaration of defeat.	But commercial pressures often rule the day,
and a system that isn't fielded is one that delivers no security.

Risk analysis is the only way.	But it's also expensive to do, far too
expensive for most purposes, so we simplify this with metrics like
"total system failure thresholds."  For an FC system 1% of the float
could be a trigger for that, as most business can absorb it.  Or, if
you can't absorb that, then maybe you have other problems.

5.  One of the big lessons of failure is redundancy.  All things fail,
so all things need alternates.	I can't say it better than this closing
discussion in the engineering failure of the WTC:

 Professor Eagar: I think the terrorist danger will be other things. A
terrorist is not going to attack the things you expect him to attack.
The real problem is pipelines, electrical transmission, dams, nuclear
plants, railroads. A terrorist's job is to scare people. He or she
doesn't have to harm very many people. Anthrax is a perfect example. If
someone could wipe out one electrical transmission line and cause a
brownout in all of New York City or Los Angeles, there would be
hysteria, if people realized it was a terrorist that did it.

 Fortunately, we have enough redundancy -- the same type of redundancy
we talk about structurally in the World Trade Center -- in our
electrical distribution. We have that redundancy built in. I shouldn't
say this, but this was how Enron was able to build up a business,
because they could transfer their energy from wherever they were
producing it into California, which was having problems, and make a
fortune -- for a short period of time. 

 NOVA: Gas pipelines don't have redundancy built in, though. 

 Eagar: No, but one advantage of a gas pipeline is the damage you can
do to it is relatively limited. You might be able to destroy several
hundred yards of it, but that's not wiping out a whole city. The bigger
problem with taking out a gas pipeline is if you do it in the middle of
winter, and that gas pipeline is heating 20 percent of the homes in the
Northeast. Then all of a sudden you have 20 percent less fuel, and
everybody's going to have to turn the thermostat down, and you're going
to terrorize 30 million people. 

 The lesson we have to learn about this kind of terrorism is we have to
design flexible and redundant systems, so that we're not completely
dependent on any one thing, whether it's a single gas pipeline bringing
heat to a particular area or whatever. 

 Remember the energy crisis in 1973? That terrorized people. People
were sitting in long lines at gas pumps. It takes five or 10 years for
society to readjust to a problem like that. What happened in the energy
crisis in 1973 was we had essentially all our eggs in one basket -- the
oil basket. But by 1983, electric generating plants could flip a switch
and change from oil to coal or gas, so no one could hold a gun to our
head like they did before. 

(Snippet taken from some site that tries and fails to make a conspiracy
case. 
http://911research.wtc7.net/disinfo/experts/articles/eagar_nova/nova_ea
gar1.html
http://911research.wtc7.net/disinfo/experts/articles/eagar_nova/nova_ea
gar2.html
)

Good stuff.  Now try and design a system of money issuance that doesn't
have a single point of failure - that's good FC engineering.

-- 
Powered by Movable Type
Version 2.64
http://www.movabletype.org/