In a recent TED Talk, Tim Berners Lee laid out his next vision for the world wide web... something he likes to call "Linked Data." Instead of putting fairly unstructured documents on the web, we should also put highly structured raw data on the web. This data would have relationships with other pieces of data on the web, and these relationships would be described by having data files "link" to each other with URLs.
This sounds similar to his previous vision, which he called the "semantic web," but the "linked data" web sounds a bit more practical. This change of focus is good, because as I covered before, a "true" semantic web is at best impractical, and at worst impossible. However, just as before, I really don't think he's thought this one through...
The talk is up on the on the TED conference page if you'd like to see it. As is typical of all his speeches, the first 5 minutes is him tooting his own horn...
- Ever heard of the web? Yeah, I did that.
- I I I.
- Me me me.
- The grass roots movement helped, but let's keep talking about me.
- I also invented shoes.
I'll address his idea of Linked Data next week -- preview: I don't think it will work. -- but I first need to get this off my chest. No one single person toiled in obscurity and "invented the web." I really wish he would stop making this claim, and stop fostering this "web worship" about how the entire internet should be the web... because its actually hurting innovation.
Let's be clear: Tim took one guy's invention (hypertext) and combined it with another guy's invention (file sharing) by extending another guy's invention (network protocols). Most of the cool ideas in hypertext -- two-way links, and managing broken links -- were too hard to do over a network, so he just punted and said 404! In addition, the entire system would have languished in obscurity without yet another guy's invention (the web browser). There are many people more important than Tim who laid the groundwork for what we now call "the web," and he just makes himself look foolish and petty for not giving them credit. Tim's HTTP protocol was just an natural extension of other people's inventions that were truly innovative.
Now, Tim did invent the URL -- which is cool, but again, hardly innovative. Anybody who has seen an email address would be familiar with the utility of a "uniform resource identifier." And as I noted before, URLs are kind of backwards, so its not like he totally nailed the problem.
As Ken says... anybody who claims to have "invented the web" is delusional. Its would be exactly like if a guy 2000 years ago asked: "wouldn't it be great if we could get lots of water from the lake, to the center of the town?" And then claimed to have invented the aqueduct.
As Alec says... the early 90s was an amazing time for software. There was so much computing power in the hands of so many people, all of whom understood the importance of making data transfer easier for the average person... Every data transfer protocol was slightly better than the last, and more kept coming every day. It was only a matter of time until some minor improvement on existing protocols was simple enough to create a critical mass of adoption. The web was one such system... along with email and instant messaging.
Case in point: any geek graduate of the University of Minnesota would know that the Gopher hyperlinking protocol pre-dated HTTP by several years. It was based on FTP, and the Gopher client had easily clickable links to other Gopher documents. It failed to gain popularity because it imposed a rigid file format and folder structure... plus Minnesota shot themselves in the foot by demanding royalty fees from other Universities just when HTTP became available. So HTTP exploded in popularity, while Gopher stagnated and never improved.
But, the popularity of the web is a double-edge sword. Sure, it helps people collaborate and communicate, enabling faster innovation in business. But ironically, the popularity of the web is hurting new innovation on the internet itself. Too much attention is paid to it, and better protocols get little attention... and the process for modifying HTTP is so damn political, good luck making it better.
For example... most companies love to firewall everything they can, so people can't run interesting file sharing applications. It wasn't always like this... because data transfer was less common, network guys used to run all kinds of things that synced data and transferred files. But, as the web because much more popular, threats became more common, and network security was overwhelmed. They started blocking applications with firewalls, and emails with ZIP attachments just to lessen their workload... But they couldn't possibly block the web! So they left it open.
This is a false sense of security, because people will figure ways around it. Its standard hacker handbook stuff: just channel all your data through port 80, and limp along with the limitations. These are the folks who can tunnel NFS through DNS... they'll find their way through the web to get at your data.
What else could possibly explain the existence of WebDAV, CalDAV, RSS, SOAP, and REST? They sure as hell aren't the best way for two machines to communicate... not by a long shot. And they certainly open up new attack vectors... but people use them because port 80 is never blocked by the firewall, and they are making the best of the situation. As Bruce Schneier said, "SOAP is designed as a firewall friendly protocol, which is like a skull friendly bullet." If it weren't for the popularity of the web, maybe people would think harder about solving the secure server-to-server communication problem... but now we're stuck.
All this "web worship" is nothing more than the fallacy of assuming something is good just because it's popular. Yes, the web is good... but not because of the technology; it's good because of how people use it to share information... and frankly, if Tim never invented the web, no big loss; we'd probably be using something much better instead... but now we're stuck. We can call it Web 2.0 to make you feel better, but it's nowhere near the overhaul of web protocols that are so badly needed... Its a bundle of work-arounds that Microsoft and Netscape and open source developers bolted on to Web 1.0 to make it suck less... and now it too is reaching a critical mass. Lucky us: we'll be stuck with that as well.
What would this "better protocol" be like? Well... it would probably be able to transfer large files reliably. Imagine that! It would also be able to transfer lots and lots of little files without round-trip latency issues. It would also support streaming media. It would have built-in distributed identity management. It would also support some kind of messaging, so instead of "pulling" a site's RSS feed a million times per day, you'd get "pushed" an alert when something changes. Maybe it would have some "quality of service" options. Most importantly, it would allow bandwidth sharing for small sites with popular content, to improve the reach of large niche data.
All these technologies already exist in popular protocols... but they are not in "the web." All of these technologies are likewise critical for anything like Tim's "Linked Data" vision to be even remotely practical. All things being equal, the web is almost certainly the WORST way to achieve a giant system of linked data. Just because you can do it over the web, that doesn't mean you should. But again... we're stuck with the web... so we'll probably have to limp along, as always. Developers are accustomed to legacy systems... we'll make it work somehow.
Now that I've gotten that out of my system, I'll be able to do a more objective analysis of "Linked Data" next week.