Popularity of the Web Considered Harmful

In a recent TED Talk, Tim Berners Lee laid out his next vision for the world wide web... something he likes to call "Linked Data." Instead of putting fairly unstructured documents on the web, we should also put highly structured raw data on the web. This data would have relationships with other pieces of data on the web, and these relationships would be described by having data files "link" to each other with URLs.

This sounds similar to his previous vision, which he called the "semantic web," but the "linked data" web sounds a bit more practical. This change of focus is good, because as I covered before, a "true" semantic web is at best impractical, and at worst impossible. However, just as before, I really don't think he's thought this one through...

The talk is up on the on the TED conference page if you'd like to see it. As is typical of all his speeches, the first 5 minutes is him tooting his own horn...

  • Ever heard of the web? Yeah, I did that.
  • I I I.
  • Me me me.
  • The grass roots movement helped, but let's keep talking about me.
  • I also invented shoes.

I'll address his idea of Linked Data next week -- preview: I don't think it will work. -- but I first need to get this off my chest. No one single person toiled in obscurity and "invented the web." I really wish he would stop making this claim, and stop fostering this "web worship" about how the entire internet should be the web... because its actually hurting innovation.

Let's be clear: Tim took one guy's invention (hypertext) and combined it with another guy's invention (file sharing) by extending another guy's invention (network protocols). Most of the cool ideas in hypertext -- two-way links, and managing broken links -- were too hard to do over a network, so he just punted and said 404! In addition, the entire system would have languished in obscurity without yet another guy's invention (the web browser). There are many people more important than Tim who laid the groundwork for what we now call "the web," and he just makes himself look foolish and petty for not giving them credit. Tim's HTTP protocol was just an natural extension of other people's inventions that were truly innovative.

Now, Tim did invent the URL -- which is cool, but again, hardly innovative. Anybody who has seen an email address would be familiar with the utility of a "uniform resource identifier." And as I noted before, URLs are kind of backwards, so its not like he totally nailed the problem.

As Ken says... anybody who claims to have "invented the web" is delusional. Its would be exactly like if a guy 2000 years ago asked: "wouldn't it be great if we could get lots of water from the lake, to the center of the town?" And then claimed to have invented the aqueduct.

As Alec says... the early 90s was an amazing time for software. There was so much computing power in the hands of so many people, all of whom understood the importance of making data transfer easier for the average person... Every data transfer protocol was slightly better than the last, and more kept coming every day. It was only a matter of time until some minor improvement on existing protocols was simple enough to create a critical mass of adoption. The web was one such system... along with email and instant messaging.

Case in point: any geek graduate of the University of Minnesota would know that the Gopher hyperlinking protocol pre-dated HTTP by several years. It was based on FTP, and the Gopher client had easily clickable links to other Gopher documents. It failed to gain popularity because it imposed a rigid file format and folder structure... plus Minnesota shot themselves in the foot by demanding royalty fees from other Universities just when HTTP became available. So HTTP exploded in popularity, while Gopher stagnated and never improved.

But, the popularity of the web is a double-edge sword. Sure, it helps people collaborate and communicate, enabling faster innovation in business. But ironically, the popularity of the web is hurting new innovation on the internet itself. Too much attention is paid to it, and better protocols get little attention... and the process for modifying HTTP is so damn political, good luck making it better.

For example... most companies love to firewall everything they can, so people can't run interesting file sharing applications. It wasn't always like this... because data transfer was less common, network guys used to run all kinds of things that synced data and transferred files. But, as the web because much more popular, threats became more common, and network security was overwhelmed. They started blocking applications with firewalls, and emails with ZIP attachments just to lessen their workload... But they couldn't possibly block the web! So they left it open.

This is a false sense of security, because people will figure ways around it. Its standard hacker handbook stuff: just channel all your data through port 80, and limp along with the limitations. These are the folks who can tunnel NFS through DNS... they'll find their way through the web to get at your data.

What else could possibly explain the existence of WebDAV, CalDAV, RSS, SOAP, and REST? They sure as hell aren't the best way for two machines to communicate... not by a long shot. And they certainly open up new attack vectors... but people use them because port 80 is never blocked by the firewall, and they are making the best of the situation. As Bruce Schneier said, "SOAP is designed as a firewall friendly protocol, which is like a skull friendly bullet." If it weren't for the popularity of the web, maybe people would think harder about solving the secure server-to-server communication problem... but now we're stuck.

All this "web worship" is nothing more than the fallacy of assuming something is good just because it's popular. Yes, the web is good... but not because of the technology; it's good because of how people use it to share information... and frankly, if Tim never invented the web, no big loss; we'd probably be using something much better instead... but now we're stuck. We can call it Web 2.0 to make you feel better, but it's nowhere near the overhaul of web protocols that are so badly needed... Its a bundle of work-arounds that Microsoft and Netscape and open source developers bolted on to Web 1.0 to make it suck less... and now it too is reaching a critical mass. Lucky us: we'll be stuck with that as well.

What would this "better protocol" be like? Well... it would probably be able to transfer large files reliably. Imagine that! It would also be able to transfer lots and lots of little files without round-trip latency issues. It would also support streaming media. It would have built-in distributed identity management. It would also support some kind of messaging, so instead of "pulling" a site's RSS feed a million times per day, you'd get "pushed" an alert when something changes. Maybe it would have some "quality of service" options. Most importantly, it would allow bandwidth sharing for small sites with popular content, to improve the reach of large niche data.

All these technologies already exist in popular protocols... but they are not in "the web." All of these technologies are likewise critical for anything like Tim's "Linked Data" vision to be even remotely practical. All things being equal, the web is almost certainly the WORST way to achieve a giant system of linked data. Just because you can do it over the web, that doesn't mean you should. But again... we're stuck with the web... so we'll probably have to limp along, as always. Developers are accustomed to legacy systems... we'll make it work somehow.

Now that I've gotten that out of my system, I'll be able to do a more objective analysis of "Linked Data" next week.

Skull friendly bullet

That tickled me.

yeah...

Schneier is really good at the one-liners.

nice work

This is a great post. A buddy of mine I hadn't seen for about 5 years called me the other day to tell me he lost touch because corporate IT now blocks IM and so in a fit of rage he accidentally deleted his whole address book.

I laughed and said "yeah, we just use facebook"

As long as two or more computers are somehow linked together communication will happen between them.

re: nice work

Thanks!

I do the same with a lot of friends... Since everybody blocks IM, we're stuck with that wretched "chat" module on Facebook.

I blame a combination of "web worship" and the lack of a consistent and repeatable set of network security patterns in the enterprise.

Solid contact

Bex,

I don't always agree with you. Hell, I don't always understand what you're talking about. But I love reading your blog. Well done!

Re: Solid contact

that's the curse about being a serial contrarian... I feel a strong urge to argue against group-think. On the negative side, my arguments are sometimes unfair, and folks disagree with me a lot. On the plus side, sometimes I nail it ;-)

Your attack on Berners-Lee and linked data

Seems to me that your rant is a reaction somewhat typical of a proprietary company's FUD defense against a new open idea. Reminds me of Redmond or Armonk actually, but more particularly, of the ego-driven hot air blasts from Redmond Shores (and more particularly, of "Kamp Kyoto", Woodside CA) .

Nowhere in your discourse did you actually examine or discuss the new Berners-Lee "linked-data" concept intelligently or even dispassionately. You start by outlining the concept, but do not instance or discuss any of the delivered benefits as outlined in Berners-Lee TED talk. You continue with a character assassination of Berners-Lee that has nothing to do with the idea under discussion. You go on to slag Berners-Lee off for writing the original seminal paper on hypertext and hyperlinking, for building the original "web" on the CERN NEXT Cube and other computers, and for (gasp) the heresy of putting his and CERN's intellectual property into the public domain, gifting posterity with the web. How dare that "communist" Berners-Lee go against your wonderful Wall Street greed is good concept!!!! Shame on Mr Berners-Lee! The concept of linked data being in the public domain; using scientific data generated by public funds for the betterment of all is therefore naturally completely foreign to your profit-driven ideology, and thus calls for FUD. Enter, your rant...

How about a real critique with some objectivity next time? But of course, you will not publish this comment anyway, as any criticism is completely wasted.

Re: Your attack on Berners-Lee and linked data

How about a real critique with some objectivity next time? But of course, you will not publish this comment anyway, as any criticism is completely wasted.

As I said clearly in the article: "I'll address his idea of Linked Data next week -- preview: I don't think it will work."

There are several reasons why I don't think it will work. Many of which I covered in a previous snarky post that I linked to above... but I plan on filling in some more specifics. In general, the whole buzz about "Linked Data" is barely different from the "ego-driven hot air blasts" from the late 1990s and early 2000s about how XML and XHTML were going to "revolutionize the web."

Linked Data will have some useful impact... but I see it as little more than yet another way to make selected data easier to reuse. In the long run, I'd wager its impact will be less than Google Maps or Freebase. People don't want RDF or SPARQL; they want easy JavaScript APIs... and embedding "semantics" on the web is little more than an invitation to spammers.

You go on to slag Berners-Lee off for writing the original seminal paper on hypertext and hyperlinking

Berners-Lee invented nether hypertext nor the hyperlink. Ted Nelson and Douglas Engelbart did in the 1960s. Want to see the 1968 demo?. Tim did contribute a lot to the web, but he also got a lot of help from the ideas and innovations of others... and if he hadn't put all the pieces together, somebody else probably would have.

Maybe Apple.

BTW, the CERN web was the 7th web, not the 1st

If you want to get technical, when Tim build the first HTTP-base web server in 1990, it was actually the seventh implementation of a hypertext / hyperlinked network. In chronological order, the below were all systems of hypertext or hyperlinking that came before:

I'm not counting ENQUIRE or HyperCard above... since their emphasis was not on networks.

This isn't about proprietary systems, nor is it about character assassination. Its about the truth. The truth is that "the web" was envisioned and written way before Tim came along... but it was the cost of hardware that limited its success until the 1990s. Anybody who claims to be the one single person who "invented the web" is either delusional, or a jerk. If Tim's delusional, then odds are low that his ideas should be listened to... If he's a jerk, then I sure as hell won't feel bad about hurting his feelings with a history lesson...

Telesophy was the 7th hypermedia system before WWW

actually you should also include the Telesophy system 1985 at Bellcore. This was a networked hypermedia system that could search across multiple sources distributed across the network as well as display a full range of multimedia documents. It also did several hard parts left out of the later WWW, especially handling links properly with database ids so that the linked to objects could move around the network as they evolved. The type-dependent multimedia display was copied by Mosaic, which was the browser that catalyzed the Web and popularized WWW by using it as the underlying engine. The distributed search uniformly across sources was copied by Google, whose founders saw it in the Illinois Digital Library project. (Personal note: I was the leader of Telesophy and of the Illinois Digital Library project, also the scientific advisor to Mosaic, see retrospective papers at www.canis.uiuc.edu .) The main contribution of WWW was its flexible protocol connection, which could do FTP and Gopher and several others, which is why it was used as the Mosaic engine. Most objective histories assign the first idea of hypermedia to Xanadu (Ted Nelson), the first implementation to Intermedia (Andres van Dam), the first networked hypermedia to Telesophy (Bruce Schatz), the first mass networked hypermedia to Mosaic (Marc Andreesen and Eric Bina). Tim Berners Lee does not appear on these first lists, his real contribution was organizational in founding the organizations for standardizing protocols that eventually lead to W3C. And yes the timing predictions were well known, see the 1984 Telesophy document for example. The mass distribution occurred when the personal computer price for an "engineering workstation" dropped into the $5000 range, this happened around 1994, years before most people predicted but when Mosaic became the first widespread hypermedia. Hope this helps clarify!

Recent comments