I'm a power hater. I don't hate often, but when I do, I do it with gusto. So I have to say, this pile of vaporware called "The Semantic Web" is really starting to tick me off...
I'm not sure why, but recently it seems to be rearing its ugly head again in the information management industry, and wooing new potential victims (like Yahoo). I think its trying to ride the coattails of Web 2.0 -- particularly folksonomies and microformats. Nevertheless, I feel the need to expose it as the massive waste of time, energy, and brainpower that it is. People should stay focused on the very solvable problem of context, and thoroughly avoid the pipe dreams about semantics. Keep it simple, and you'll be much happier.
First, let's review what the "Semantic Web" is supposed to be... A semantic web is about a system that understands the meaning of web pages, and not merely the words on the page. Its about embedding information in your pages so computers can understand what things are, and how they are related. Such a beast would have tremendous value:
"I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize." -- Tim Berners-Lee, Director of the W3C, 1999
Gee. A future where human thought is irrelevant. How fun.
First, notice that this quote was from 1999. Its been ten years since Timmy complained that the semantic web was taking too long to materialize. So what has the W3C got to show for their decade of effort? A bunch of bloated XML formats that nobody uses... because we apparently needed more of those. By way of comparison, Timmy released the first web server on August 6, 1991... Within 3 years there were 4 public search engines, a solid web browser, and a million web pages. If there was actually any value in the "Semantic Web," why hasn't it emerged some time in the past 18 years?
I believe the problem is that Timmy is blinded by a vision and he can't let go... I hate to put it this way, but when compared against all other software pioneers, Timmy's kind of a one trick pony. He invented the HTTP protocol and the web server, and he continues to milk that for new awards every year... while never acknowledging the fact that the web's true turning point was when Marc Andreessen invented the Mosaic Web Browser. I'm positive Timmy's a lot smarter than I, but he seems stuck in a loop that his ego won't let him get out of.
The past 10,000 years of civilization has taught us the same things over and over: machines cannot replace people, they can only make people more productive by automating the mundane. Once machines become capable of solving the "hard problems," some wacky human goes off and finds even harder problems that machines can't solve alone... which then creates demand for humans to solve that next problem alone, or build a new kind of machine to do so.
Seriously... this is all just basic economics...
Computers can only do what they are told; they never "understand" anything. There will always be a noticeable gap between how a computer works, and how a human thinks. All software programs are based on symbol manipulation, which is a far cry from processing a semantically rich paragraph about the meaning of data. Well... isn't it possible to create a software program that uses symbol manipulation to "understand" semantics? Mathematicians, psychologists, and philosophers say "hell no..."
The Chinese Room thought experiment pretty clearly demonstrates that a symbol manipulation machine can never achieve true "human" intelligence. This is not to imply human brains are the only way to go... merely that if your goal is to mimic a human you're out of luck. Even worse, Gödel's Incompleteness Theorem proves that all systems of formal logic (mathematics, software, algorithms, etc.) are fundamentally error-prone. They sometimes cannot prove the truth of a true statement, and other times they prove the truth of false statements! Clearly, there are fundamental limits to what computers can do, one of which is to understand "meaning".
Therefore, even in theory, a true "semantic web" is impossible...
Well... who the hell cares about philosophical purity, anyway? There are many artificial intelligence experts working on the semantic web, and they rightly observe that the system doesn't have to be equivalent to human intelligence... As long as the system behaves like it has human intelligence, that's good enough. This is pretty much the Turing Test for artificial intelligence. If a human judge interacts with a machine, and the judge believes he is interacting with a real live human, then the machine has passed the test. This is what some call "weak" artificial intelligence.
Essentially, If it walks like a duck, and talks like a duck, then its a duck...
Fair enough... So, since we can't give birth to true AI, we'll get a jumble of smaller systems that together might behave like a real, live human. Or at least a duck. This means a lot of hardware, a lot of software, a lot of data entry, and a lot of maintenance. Ideally these systems would be little "agents" that search for knowledge on the web, and "learn" on their own... but there will always be a need for human intervention and sanity checks to make sure the "smart agents" are functioning properly.
That raises the question, how much human effort is involved in maintaining a system that behaves like a "weak" semantic web? Is the extra effort worth it when compared to a blend of simpler tools and manual processes?
Unfortunately, we don't have the data to answer this question. Nobody can say, because nobody has gotten even close to building a "weak" semantic web with much breadth... Timmy himself has said "This simple idea, however, remains largely unrealized" in 2006. Some people have seen success with highly specialized information management problems, that had strict vocabularies. However, I'd wager that they would have equivalent success with simpler tools like a controlled thesaurus, embedded metadata, a search engine, or pretty much any relational database in existence. That ain't rocket science, and each alternative is older than the web itself...
Now... to get the "weak semantic web" we'll need to scale up from one highly specialized problem to the entire internet... which yields a bewildering series of problems:
- Who gets to tag their web pages with metadata about what the page is "about"?
- What about SPAM? There's a damn good reason why search engines in the 90s began to ignore the "keywords" meta tag.
- Who will maintain the billions of data structures necessary to explain everything on the web?
- What about novices? Bad metadata and bad structures dilute the entire system, so each one of those billion formats will require years of negotiation between experts.
- Who gets to "kick out" bad metadata pages, to prevent pollution of the semantic web?
- What about vandals? I could get you de-ranked and de-listed if you fail to observe all ten billion rules.
- Who gets to absorb web pages to extract the knowledge?
- What about copyrights? Your "smart agent" could be a "derivative work," so some of the best content may remain hidden.
- Who gets to track behavior to validate the semantic model?
- What about privacy? If my clicks help you sell to others, I should be compensated.
- Will we require people to share analytical data so the semantic web can grow?
- What about incentives? Nobody using the web for commerce will share, unless there's a clear profit path.
I'm sorry... but you're fighting basic human nature if you expect all this to happen... my feeling is that for most "real world" problems, a "semantic web" is far from the most practical solution.
So, where does this leave us? We're not hopeless, we're just misguided. We need to come down a little, and be reasonable about what is and is not feasible. I'd prefer if people worked towards the much more reachable goal of context sensitivity. Just make systems that gather a little bit more information about a user's behavior, who they are, what they view, and how they organize it. This is just a blend of identity management, metadata management, context management, and web trend analysis. That ain't rocket science... And don't think for one second that you can replace humans with technology: instead, focus on making tools that allow humans to do their jobs better.
Of course, if the Semantic Web goes away, then I'll need to find something else to power hate. I'm open to suggestions...