Thursday, August 9, 2007

Weekly Weigh-In #6 - The "Self-Healing" Internet


Any business that relies heavily on web apps and services needs to have a safety net. This was made abundantly clear this week when we saw cisco.com of all sites offline for a little under three hours during U.S. business hours.

Cisco’s PR team posted the following message:

"We have traced the cause of the issue to an accident during maintenance of a San Jose data center that resulted in a power outage in that facility."

Now, irrespective of the amount of money that they, their partners and their clients lost while the site was down (which after doing a little research could potentially be quite a lot) this begs the question: what happened to redundancy, traffic engineering, distributed networking, the self-healing network, etc?

Cisco as the giant networking company must surely have failovers in place to prevent this kind of scenario and they of all people should know that a well designed network infrastructure will not have the failover backup data center in the same locale, state or even country.

Regardless of the reason(s)-for-outage, we have to seriously start considering how to make the Internet openly self-healing. My rudimentary thoughts on this include a possible scenario where web servers from across the world serve as "peer-to-peer cache-servers" for other websites, such that when a user requests a page from pingsta.com for example and that particular page is temporarily unavailable for whatever reason, any other non-pingsta web server closest to the user geographically can present the most recently cached copy of that page seamlesly.

Thoughts?

Owen