The modern-day Internet as we know it, is an evolution of ARPANET, created in
1969, and designed to be a robust communications network which would withstand
a nuclear attack in times of war.ARPANET was meant to be resilient to this sort of problem, capable of surviving
catastrophic world events, and automatically, and near instantaneously,
re-working itself around problems which could crop up.
The current topologies and design of the Internet are based upon those which
made ARPANET what it was, but much of the underlying infrastructure and its
higher-level data carrying capabilities have been considerably improved upon.
The resilience of ARPANET should still be present in what we have today.
It seems though that the reality is somewhat different to what the design goals
of the Internet should have delivered. Today, it seems that even the
slightest hiccup causes a good chunk of the UK's internet access goes down, and
totally, for sustained periods.
It will always be the case that the world of reality is often
entirely different to theory ( like the rocket cars we should be driving, all
that leisure time we wouldn't know what to do with, and nuclear generated
electricity that would be so cheap we wouldn't need to pay for it ), but why
is a system, which was designed to be so robust and reliable, actually so
fragile ?
The tale of a home user using two modems dialled into his ISP, finding
his PC grinding to a halt because a whole country's internet access was being
routed through their PC when an ISP's service failed may be little more than
Urban Myth, but it highlights exactly what automatic recovery and automatic
routing to keep the Internet ticking over was meant to be like.
No single point of failure should have a significant impact on Internet Users
and redundancies in the systems provide alternative routes of connection while
overloaded systems can pass their excess onto other systems which can cope. All
of which goes on behind the scenes, without the end-user noticing anything more
than a momentary glitch if anything.
That theory was shown to be little more than just a theory though when
millions of Internet Users in the UK effectively lost the use of the Internet
entirely for a considerable amount of time.
NTL was the Service Provider which was most severely hit by the problems,
losing access to most Internet services for over 8 hours, with consequential
problems lasting well into the aftermath, but other ISP's such as Freeserve,
Nildram, Pipex and Telewest are also reported to have had their services
impacted upon. BT has also said that some of its voice services were hit by
the problem.
The root cause of the problem is reported to be some damage to an
inter-continental cable between the UK and France which had occurred just off
the French coast, but the knock-on effects caused NTL's DNS services to
crash, preventing many people from accessing any services at all.
It's all well and good Service Providers explaining why their systems fell over
big-time, but the question which must be asked is why these systems are allowed
to fall over. What has happened to the redundancy built into the system ? Why
are single point failures so catastrophic, and why do they have such significant
knock-on effects ?
No one would expect a Service Provider to provide backup for the entirety of the
Internet structure; if the magic 'cable routing box' for a street is blown-up
by terrorists or dismantled by vandals, customers expect their access to be
steeply curtailed until
it is fixed, or they use their own redundancy capabilities by dialling out on
an alternative telephone line, through a mobile phone or even by running a
wire to the nearest house which still has internet access, to keep their
own access going.
At the centre of the Internet things are entirely different. The
inter-continental links are critical to permitting global access between
countries and single point failures there are obviously going to have a
significant impact on the Internet as a whole. This is where redundancy and
alternative routes come into play, and it is something we look to ISP's
dealing with as individual home users have no control over the issue.
If the main link from the UK to France goes down, the secondary link should
kick in, and, apart from a minimal loss of connection while it does, and a
slowing down of access as everyone gets funnelled through a less than ideal
link, everything should continue as normal. Even if that secondary link fails
then we can still access France ( and anywhere else ) by routing through any
of the links which are unaffected, even if it means that an email to the PC
on the desk next to you goes round the world and back in its travels. This
is the fundamental resilience the Internet should offer.
At worse, a loss of inter-continental access from the UK should do nothing
more than isolate the UK from everyone else. The UK's Internet should keep
running as normal. We'll still be able to send emails within the UK and
still browse web sites hosted within the UK; that's another fundamental
resilience designed in.
And while email, newsgroup, web browsing and game playing services may all
fail individually at times, there is no reason why any failure should cause
problems for other services. A third resilience we have.
Key services upon which the Internet relies, which are mostly hidden from the
users, are also duplicated and designed to ignore overloads, or pass what
they can't deal with onto systems which have the capacity to cope. In short,
if it can go wrong, there's something in place to keep everything working as
a whole; slightly slower perhaps, but still functional. It all makes for a
an apparently unfailing system.
Yet when a problem, which amounts to little more than someone in France having
accidentally pulled the UK-France connection plug out of its wall socket, the,
rather insignificant event given the design of the Internet, had a catastrophic
and snow-balling effect right through the system, leaving millions in the UK
without access to any Internet services at all.
Imagine this having happened to a Military Internet during times of crisis; it
would be the most catastrophic event imaginable, and so simply done - the
eyes, ears and voice of the UK closed and silenced at the mere flick of a switch
from beyond its shores.
Would the military rely upon such a system ? Perhaps they do, and they just
aren't admitting it, but if the military can design their systems to overcome
such fateful scenarios, then why can't commercial Service Providers ?
Is it really the simple case that it is commercial drive to make profit over
all else which has led them to undermine the resilience of what we should
have, or are there really fundamental flaws in the design of the Internet which
makes the imagined resilience only a dream when it's put into practice ?
That France's Internet services, along with the rest of the world's, kept going
when the connection to the UK was "pulled" suggests that the problems are
entirely within the UK's domain. That some Service Providers managed to maintain
services for their customers while others were unable to deliver any real
service at all suggests that the failings lie only with a few of the
companies involved.
It is reported that the automatic re-routing of traffic did occur, but because
of the problems caused elsewhere, those using affected Service Providers had
little traffic to be so routed. No matter what parts of the Internet did remain
working, and functioning as predicted, it is but a moot point for those who
found they could do nothing at all. That services have been restored before the
cable has been repaired suggests it is a failing which is separate to the issue
of that single point fault.
Whatever the case, the Service Providers have a duty to explain why a minor
problem can have such a massive effect on so many. We know "why" the problem
occurred - "It's France's fault", allegedly - but the question "why" requires
much deeper answers than that.
It is not just a question of providing an answer to fee paying customers and
businesses who have come to rely upon Internet access, but to those who are
wondering if the
Service Providers are undermining the very soul of the Internet, and are
equally wondering why this is being done, and those trying to formulate
solutions to make the Internet, once more, what it was meant to be. We need
to know if we can trust commercial Service Providers to deliver what we are
paying for, and even what they are meant to provide, and we need to decide if we
should take this highly important business out of their hands and put it in
the hands of the Government to look after. Has self-regulation failed ? If it
has, we need to decide what to do in response.
The only alternative is to redefine "The UK Internet" : A loosely coupled network
of computer systems connected through points of access which are prone to
failure, with ineffective redundancy, a lack of resilience, which continues to
work on a wing and a prayer, with the risk of catastrophic failure should the
smallest of problems occur. A theoretically superb, robust and resilient design,
undermined by implementation and commercial concerns.
Hardly what was dreamt of in the 1960's.