The question is, how do we make enterprise search better? Some people complain that enterprise search should behave more like Google search, which I vehemently disagree with, for one primary reason: enterprise search is a FUNDAMENTALLY different problem than internet search. Here are some examples:
The internet search problem is like this:
- Heavily linked pages, which can be analyzed for "relevance" and "importance"
- Spam is a constant problem
- People don't want you to monitor their behavior
- People obsess about their Google Page rank
- People obsess about their hit count
- People aren't looking for the answer, they are looking for an answer
The whole problem reminds me of a scene from The Zero Effect:
Now, a few words on looking for things. When you look for something specific, your chances of finding it are very bad... because of all things in the world, you only want one of them. When you look for anything at all, your chances of finding it are very good... because of all the things in the world, you're sure to find some of them.
Internet search is like looking for anything at all... whereas enterprise search is like looking for something specific:
- People don't want general information; they want the 100% definitive answer
- The trust level is usually higher between co-workers, than between random web surfers... or at least it should be. Otherwise, you got bigger problems than information management.
- You know exactly who is running the search
- You know exactly what department they are in, and what content they are likely to need
- You know exactly their previous search history, possibly even their favorite "tags"
- Spam is minimal, or non-existent
- Content uses few, if any, hyperlinks to help determine relevance
- People usually write content because of obligation, and do not usually care about making it easy for their audience to understand
Trying to solve both problems with the same exact tool will only lead to frustration...
Now... Solving this problem with social tools is a much easier, and arguably better approach. People usually don't want to know the answer, people usually want to know who knows the answer. This is an observation as old as Mooer's Law (1959) about information management:
“An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it.”
Fifty years later, and folks still don't quite seem to get it... The average user does not want to read enterprise content! They don't read documentation on the subject, nor do they read books on the subject, nor do they read blogs on the subject... In general, people don't care to actually learn anything new; they just want the quick answer that lets them move on and get back to their normal job. Most people look for information so they can perform some kind of task, and then they'll be more than happy to forget that information afterward. Its a rare individual who learns for the sake of knowledge... These folks are sometimes called Mavens, and everybody wants to be connected with these Mavens so they can do their jobs better. As a result, these Mavens will always be overwhelmed with phone calls, emails, and meeting invites.
As those mediums became flooded, some of your resources fled to other places -- like Twitter, or Facebook, or enterprise social software -- and forced would-be connectors to follow. This constant movement (or hiding) helps a bit... but its only a matter of time before those mediums get flooded as well, and the noise overwhelms the signal.
In order to truly solve the enterprise search problem, you need to first understand why people may choose to never use enterprise search, no matter how good it is... then try to bring them back into the fold with socially enabled enterprise search tools. Don't just help people find information; help them find somebody who understands what the information means. Connecting people with mere words can easily backfire, and might actually make these people a burden on society. Instead, connect them with real, live humans who are eager to teach the knowledge being sought. At the same time, you need to work hard to protect these Mavens, so they don't flee your system in favor of another.
This is a problem that Google's search engine cannot solve -- mainly for privacy and trust reasons -- but it is 100% solvable in the enterprise. I'm just wondering why so few have done it...
There's a great developer site out there called 99 Bottles Of Beer. It shows you how to output the lyrics of the oh-so-annoying camp song in well over 1000 different programming languages.
Woah... 1000 languages, you say? Yes, there are well over 1000 known programming languages, but please keep in mind how developers think. Most of these languages are klunky, impractical, or intentionally impossible to use. These are sometimes called esoteric languages, or even Turing tarpits. Here are some of my favorite bizarre programming languages:
- Whitespace: no letters, no numbers, no symbols... the only valid syntax is tab, space, and carriage returns.
- LOLCODE: the syntax looks like something you'd see on a LOL cats poster. I HAZ A BEERZ ITZ 99! IM IN YR LOOP! IZ VAR LIEK 0? KTHXBYE!
- Piet: just damn pixels on a screen... no letters even!
- Cow: instead of number and symbols, you only get moOmOOmoOmOoOOM.
- Brainf**k: trust me... you do NOT want to maintain code written in this language...
Kidding aside, there's a pretty good argument that learning how to print out 99 bottles of beer is a useful exercise when learning a new language. You need to learn the syntax of variables, conditionals, text output, and loops. Not to mention the fact that every language has nuances that can sometimes help you to further minimize your code base, but not sacrifice clarity... there's probably a dozen ways to write it in each laguage, each with different benefits.
So -- seeing how Oracle UCM was being left out -- I submitted the below code to their site. 99 Bottles of Beer, in IdocScript:
<$numBottles = "99", bottleStr = " bottles "$> <$loopwhile (numBottles > 0)$> <$verse = numBottles & bottleStr & "of beer on the wall,\n" & numBottles & bottleStr & "of beer!\n" & "Take one down, pass it around,\n"$> <$numBottles = numBottles - 1$> <$if numBottles > 0$> <$if numBottles == 1$> <$bottleStr = " bottle "$> <$endif$> <$verse = verse & numBottles & bottleStr & "of beer on the wall!\n"$> <$else$> <$verse = verse & "no more bottles of beer on the wall!\n"$> <$endif$> <$verse$> <$endloop$>
Naturally, there are multiple ways to do this... you could use resource includes, localization strings, result sets, etc. But that's part of the fun of learning a new language. I'll leave it as an exercise for my audience to make it better.
One of the biggest challenges in social networks is keeping them updated. When you first log in, its a blank slate, and you have to find all your friends and make connections to them. This is a bit of a pain, so sites like Facebook and LinkedIn allow you to to import your email address book. They then data-mine the address book to see who you know that might already be in the network, which helps you make lots of connections quickly.
Ignoring the obvious security and privacy concerns, there are still two big problems with this:
- These systems find connections, but they ignore the strength and quality of those connections.
- You have to constantly import your address book if you keep making new friends.
In my latest book, I give some practical advice about how Content Management fits in with social software and Enterprise 2.0 initiatives... One of the ideas that I liked to drive home is that not all connections are equal, and it takes a lot of effort to keep quality information in your social software systems. Who is connected to whom? Which connections are genuine? And who is just a "link mooch" who is spamming people with "friend" requests just to ratchet up his ranking?
That latter one is particularly problematic on LinkedIn... Its littered with sub-par recruiters who send friend request spam so they can get something from you... but they never care to do anything for you.
Luckily, in the enterprise these problems can be solved relatively easily: data mine your email archives for who is connected to whom! By monitoring a host of statistics on who emails whom, about what, and when, you have a tremendously powerful tool for building social maps. You can determine who is connected to whom, who is an expert on which subject, and where the structural holes are in your enterprise. And you never need to maintain your connections! Any time you send a message to a friend, your social map is automatically rebuilt for you!
In order to do so, you'll need to run some data mining tools to find answers to the following questions:
- Who do you send emails to? These are the people you claim to be connected to.
- Does this person reply to your emails? If so, the connection is mutual.
- How often do you email? A one-time email is probably not a connection, but a weekly email might be a strong connection.
- How long does it take them to reply to you? A faster reply usually means your communications get priority to them, and they feel a stronger connection to you.
- How long do you take to reply to them? Again, a faster reply from you means that their communications get priority from you, meaning you feel a strong connection as well.
- Do you answer emails about a topic, or just forward them along? Just because you are the "point man" for Java questions, that doesn't mean you "know" Java... but it probably means you "know who knows" Java, which is sometimes even better.
- Does one person usually do all of the initiation of new emails? If so, then this might be a lopsided friendship, or it might just mean that one person has more free time.
- What are the topics of conversation? In reality, the more often you discuss work, the weaker the connection! If you also discuss gossip, news, current events, sports, movies, family, or trivia, then you probably have a stronger connection. The more topics you discuss, the more likely you are to be close friends.
- What is the flow of email from one department to another? If its peer-to-peer, then these departments are comfortable sharing information. If it always goes through the chain of command, then these departments are socially isolated, and probably unlikely to trust each other.
- Who do you email outside the company? If an employee in the marketing department emailed a friend who works at the company Ravenna, and your sales person is trying to connect with somebody at Ravenna, then these two employees might want to connect.
Unfortunately, many employers have a policy against using company email for personal communications. Ironically, this policy could hurt the employer in the long run, because analyzing the violations of that policy are frequently the best way to determine who is well connected in your company! So, before you deploy any social software in the enterprise, encourage your employees to goof off via email (within reason), and set up some technology to data-mine your email archives (like Oracle Universal Online Archive, or something similar). Then keep tuning your map based on the email messages people send.
That will help you hit the ground running with enterprise social software...
UPDATE: This book tour has been rescheduled for March 17th-19th.
Well, its not really a book tour... but Andy and I will be visiting 3 cities for roundtable discussions on "Pragmatic Content Management". Oracle is organizing the whole shindig, and space will be limited... Andy will be giving a talk on Pragmatic ECM strategy, then I will present on implementation advice. Then there will be a 30-minute roundtable discussion, and we'll wrap it up before lunch.
For more specific information, please read the official invitation from Oracle. Here are the cities and dates:
- Cincinnati: Tuesday, March 17, 2009
- Memphis: Wednesday, March 18, 2009
- Houston: Thursday, March 19, 2009
If you want a book signed, please register and drop by!
The boys over at InfoVark tagged me a few weeks back, trying to revive the meme why do you blog? I'll oblige, mainly because I've wanted to write something along these lines for a while.
Why Do I Blog?
This is actually my fourth blog... I tried to get into it before, but it never worked out. I was too busy, I didn't have enough to say, it just didn't feel right. I started this site back in 2006 so I'd have a landing page for my first book. I mentioned bexhuff.com several times in the book so people could come to my site, download the sample code, ask questions, and find links to other ECM resources.
One problem... by the time I finished the book and set it off to the publishers, I still hadn't launched my blog yet!
So... with panic mode setting in... I decided to force myself to write a lot of content before the book hit the shelves. I wrote some good articles, some crappy ones, but I just kept on writing. Writing writing writing! When I thought I wrote enough, I wrote some more, and saved them for later publishing.
Oddly enough, that trigger was what it took for me to finally enjoy blogging. I also noticed that the more I blogged, the better my writing became. These days, I blog for three main reasons:
- To keep my writing and communication skills sharp.
- To draw attention to events/articles on the web that deserve commentary.
- To inject my contrarian opinions into technical matters that my readers might find interesting.
That seems to be a good formula... Google Analytics says I got 170,000 pageviews in 2008... despite virtually zero self-promotion, and no guest bloggers... Not bad for somebody who also works 60 hours per week, runs his own company, writes books, manages an 18-unit condo, and travels ;-)
What Do I Blog?
Initially the topics were a tad scattered... lifehacks, technology, and all that good stuff. These days I try to keep it to software -- specifically in the information management realm -- and connections between it and other topics. I also have occasional posts on science, communication theory, alternative energy, economics, and general half-baked ideas I have... but I try to keep those to one per week.
How Do I Blog?
My blogging technique varies...
If I'm blogging just to keep my writing skills sharp, I'll take a complex subject and do my best to explain it clearly. One of my heroes there is the Nobel Prize wining Physicist Richard Feynman, who firmly believed that if you cannot explain a concept to the average college freshman, then either you're a rotten communicator, or you don't understand the concept very well. I strongly believe that this is true... so when I want to wrap my head around a tricky subject, I try to explain it to the "average educated person." Sometimes I succeed, sometimes I fail...
If I'm blogging to draw attention to recent events or articles, then I usually start by trolling on the web. I like Digg and Reddit... sometimes I just take a look at what was tagged on Delicious in the past 5 seconds. If something leaps out at me, and I think its appropriate for my readers, I'll mention it. I also follow a lot of B-grade and C-grade bloggers to see if they have penned any original prose. I try to blog twice per week, so I use this technique the most often.
If I'm feeling like writing something contrary to mainstream opinion, then my process is very methodical... it might take days, weeks, or even months to write a post, depending on how strongly held the mainstream opinion is. I usually have a half dozen such blogs in my head at any one time, waiting for the right moment. I covered the my technique in an earlier post: Five Ways To Move Beyond Conventional Wisdom, so I won't bore everybody by repeating the five steps here. I rarely win friends with contrarian posts, but I do voice objections that need to be heard.
I suppose I'll keep this in the Oracle universe, and tag the following people:
- Billy Cripe
- Dan Norris
- Matt Topper (because he's the new guy at The Apps Lab)
- Eddie Awad
- Chris Bucchere
Have at it, boys!
The W3C -- my absolutely positively most favorite standards body ever -- has just come up with an XML namespace for emotions! I must say that I fully support this specification... who on earth would ever want to type something as confusing and ambiguous as this:
When we can do The Right Thing™ and use XML instead:
<emotionml xmlns="http://www.w3.org/2008/11/emotionml"> <emotion> <category set="everydayEmotions" name="Amusement" /> <intensity value="0.7" /> </emotion> </emotionml>
Ugh... If this were released on April 1st instead of November 20th, I would have been amused. But now I'm just plain sad. As Wearehug said, "It is becoming increasingly difficult to distinguish W3C specs from Onion articles."
(Hat Tip Aristotle)
I'm a power hater. I don't hate often, but when I do, I do it with gusto. So I have to say, this pile of vaporware called "The Semantic Web" is really starting to tick me off...
I'm not sure why, but recently it seems to be rearing its ugly head again in the information management industry, and wooing new potential victims (like Yahoo). I think its trying to ride the coattails of Web 2.0 -- particularly folksonomies and microformats. Nevertheless, I feel the need to expose it as the massive waste of time, energy, and brainpower that it is. People should stay focused on the very solvable problem of context, and thoroughly avoid the pipe dreams about semantics. Keep it simple, and you'll be much happier.
First, let's review what the "Semantic Web" is supposed to be... A semantic web is about a system that understands the meaning of web pages, and not merely the words on the page. Its about embedding information in your pages so computers can understand what things are, and how they are related. Such a beast would have tremendous value:
"I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize." -- Tim Berners-Lee, Director of the W3C, 1999
Gee. A future where human thought is irrelevant. How fun.
First, notice that this quote was from 1999. Its been ten years since Timmy complained that the semantic web was taking too long to materialize. So what has the W3C got to show for their decade of effort? A bunch of bloated XML formats that nobody uses... because we apparently needed more of those. By way of comparison, Timmy released the first web server on August 6, 1991... Within 3 years there were 4 public search engines, a solid web browser, and a million web pages. If there was actually any value in the "Semantic Web," why hasn't it emerged some time in the past 18 years?
I believe the problem is that Timmy is blinded by a vision and he can't let go... I hate to put it this way, but when compared against all other software pioneers, Timmy's kind of a one trick pony. He invented the HTTP protocol and the web server, and he continues to milk that for new awards every year... while never acknowledging the fact that the web's true turning point was when Marc Andreessen invented the Mosaic Web Browser. I'm positive Timmy's a lot smarter than I, but he seems stuck in a loop that his ego won't let him get out of.
The past 10,000 years of civilization has taught us the same things over and over: machines cannot replace people, they can only make people more productive by automating the mundane. Once machines become capable of solving the "hard problems," some wacky human goes off and finds even harder problems that machines can't solve alone... which then creates demand for humans to solve that next problem alone, or build a new kind of machine to do so.
Seriously... this is all just basic economics...
Computers can only do what they are told; they never "understand" anything. There will always be a noticeable gap between how a computer works, and how a human thinks. All software programs are based on symbol manipulation, which is a far cry from processing a semantically rich paragraph about the meaning of data. Well... isn't it possible to create a software program that uses symbol manipulation to "understand" semantics? Mathematicians, psychologists, and philosophers say "hell no..."
The Chinese Room thought experiment pretty clearly demonstrates that a symbol manipulation machine can never achieve true "human" intelligence. This is not to imply human brains are the only way to go... merely that if your goal is to mimic a human you're out of luck. Even worse, Gödel's Incompleteness Theorem proves that all systems of formal logic (mathematics, software, algorithms, etc.) are fundamentally error-prone. They sometimes cannot prove the truth of a true statement, and other times they prove the truth of false statements! Clearly, there are fundamental limits to what computers can do, one of which is to understand "meaning".
Therefore, even in theory, a true "semantic web" is impossible...
Well... who the hell cares about philosophical purity, anyway? There are many artificial intelligence experts working on the semantic web, and they rightly observe that the system doesn't have to be equivalent to human intelligence... As long as the system behaves like it has human intelligence, that's good enough. This is pretty much the Turing Test for artificial intelligence. If a human judge interacts with a machine, and the judge believes he is interacting with a real live human, then the machine has passed the test. This is what some call "weak" artificial intelligence.
Essentially, If it walks like a duck, and talks like a duck, then its a duck...
Fair enough... So, since we can't give birth to true AI, we'll get a jumble of smaller systems that together might behave like a real, live human. Or at least a duck. This means a lot of hardware, a lot of software, a lot of data entry, and a lot of maintenance. Ideally these systems would be little "agents" that search for knowledge on the web, and "learn" on their own... but there will always be a need for human intervention and sanity checks to make sure the "smart agents" are functioning properly.
That raises the question, how much human effort is involved in maintaining a system that behaves like a "weak" semantic web? Is the extra effort worth it when compared to a blend of simpler tools and manual processes?
Unfortunately, we don't have the data to answer this question. Nobody can say, because nobody has gotten even close to building a "weak" semantic web with much breadth... Timmy himself has said "This simple idea, however, remains largely unrealized" in 2006. Some people have seen success with highly specialized information management problems, that had strict vocabularies. However, I'd wager that they would have equivalent success with simpler tools like a controlled thesaurus, embedded metadata, a search engine, or pretty much any relational database in existence. That ain't rocket science, and each alternative is older than the web itself...
Now... to get the "weak semantic web" we'll need to scale up from one highly specialized problem to the entire internet... which yields a bewildering series of problems:
- Who gets to tag their web pages with metadata about what the page is "about"?
- What about SPAM? There's a damn good reason why search engines in the 90s began to ignore the "keywords" meta tag.
- Who will maintain the billions of data structures necessary to explain everything on the web?
- What about novices? Bad metadata and bad structures dilute the entire system, so each one of those billion formats will require years of negotiation between experts.
- Who gets to "kick out" bad metadata pages, to prevent pollution of the semantic web?
- What about vandals? I could get you de-ranked and de-listed if you fail to observe all ten billion rules.
- Who gets to absorb web pages to extract the knowledge?
- What about copyrights? Your "smart agent" could be a "derivative work," so some of the best content may remain hidden.
- Who gets to track behavior to validate the semantic model?
- What about privacy? If my clicks help you sell to others, I should be compensated.
- Will we require people to share analytical data so the semantic web can grow?
- What about incentives? Nobody using the web for commerce will share, unless there's a clear profit path.
I'm sorry... but you're fighting basic human nature if you expect all this to happen... my feeling is that for most "real world" problems, a "semantic web" is far from the most practical solution.
So, where does this leave us? We're not hopeless, we're just misguided. We need to come down a little, and be reasonable about what is and is not feasible. I'd prefer if people worked towards the much more reachable goal of context sensitivity. Just make systems that gather a little bit more information about a user's behavior, who they are, what they view, and how they organize it. This is just a blend of identity management, metadata management, context management, and web trend analysis. That ain't rocket science... And don't think for one second that you can replace humans with technology: instead, focus on making tools that allow humans to do their jobs better.
Of course, if the Semantic Web goes away, then I'll need to find something else to power hate. I'm open to suggestions...
In the early days of computer science, people discovered what was later to be called "Conway's Law":
Any organization that designs a system (defined more broadly here than just information systems) will inevitably produce a design whose structure is a copy of the organization's communication structure.
In other words, lets say you are designing a complex system -- an auto manufacturing plant, a new financial market, a hospital, the World Health Organization, or a large software solution -- the efficiency of the end result will always be limited by the efficiency of how the committee communicates. Lets say two segments of your system need to communicate with each other... however, the two designers of those systems were unable to communicate effectively with each other. The end result will invariably be a system where those two segments are unable to exchange important information properly. If I have to run an idea by my boss before handing it off to my peer in another department, then I'll almost always design a system that uses the same paths for sending important messages... whether or not its the optimal approach.
This helps explains why large companies love Enterprise Services Buses, but small companies think they are the spawn of the devil... neither is correct, however both opinions derive from the communication structure in their respective organizations.
This goes beyond the obvious communication problems between silos and corporate fiefdoms... even the physical components you design will inevitably mirror your ability (or inability) to communicate. From Wikipedia:
Consider a large system S that the government wants to build. The government hires company X to build system S. Say company X has three engineering groups, E1, E2, and E3 that participate in the project. Conway's law suggests that it is likely that the resultant system will consist of 3 major subsystems (S1, S2, S3), each built by one of the engineering groups. More importantly, the resultant interfaces between the subsystems (S1-S2, S1-S3, etc) will reflect the quality and nature of the real-world interpersonal communications between the respective engineering groups (E1-E2, E1-E3, etc).
Another example: Consider a two-person team of software engineers, A and B. Say A designs and codes a software class X. Later, the team discovers that class X needs some new features. If A adds the features, A is likely to simply expand X to include the new features. If B adds the new features, B may be afraid of breaking X, and so instead will create a new derived class X2 that inherits X's features, and puts the new features in X2. So, in this example, the final design is a reflection of who implemented the functionality.
How do you avoid becoming a similar statistic? Simple: be flexible.
The more flexible you are when making the design, the more flexible you are to adopt new ideas and new ways of communicating, the more likely you are to create a useful product. For those who looooooooooove process, then what you need is a process for injecting flexibility into your system when metrics demonstrate a communication problem.
The number one task of any business is to make money. The number two task is to improve inter-departmental communication. After that, all problems can be solved.
I've always said, the most important skill a technical person can posses is the ability to communicate... you might not have a remarkable impact on any one feature, but you'll be better positioned to understand the whole problem, and the whole solution. Talk with your peers, and make sure that the lines of communications are 100% open across divisions... Especially divisions that hate each other. Make sure people feel connected, and that they can trust the opinions and needs of others.
Only then will a committee be able to design a system less dysfunctional than itself...
There are a lot of non-techie skills that make you a better software developer... I've found that when trying to debug people's problems, you tend to run into a lot of situations where you are reading off DNS names that sounds almost exactly the same: "Did you say 'dee zee cee zee one,' or 'dee cee zee zee one,' or 'dee zee zee zee one,' or ...
You get the picture...
So, one of my new year's resolutions was to memorize the phonetic alphabet. This is the code that the military uses to help prevent confusion when dealing with pass codes and acronyms.
Of course... if you start using these you might want to warn people... otherwise your audience might wonder why Romeo and Juliet are drinking a Kilo of Whiskey in Quebec...
So, what non-techie skills do you find helpful?
Yikes... Confusing, unclear, and cluttered since July of 2007... Not quite a ringing endorsement from the "crowd," eh?
The Wikipedia article for the Association for Information and Image Management isn't any better... at least Stellent's tiny tiny page is excusable since it doesn't exist as a company anymore. Considering the fact that folks like IBM, Oracle, EMC, and Microsoft all have product suites in this industry -- and considering how all of them tout blogs and wikis -- you'd think that somebody would have cleaned up Wikipedia by now.
I guess we all have better things to do...
Personally, I find this a refreshing reminder that the "semantic web" will NOT save you. Unless you do the hard work of creating new business processes around new information management technology, you'll just be cluttering your enterprise with ever more outdated, useless, and false data.
Cordell sent me an interesting article about how IT Certification is becoming less important. Some bloggers -- like James -- believe IT Certifications could have value if they just raised their standards a bit... but I'm not so sure. You used to be able to take the average tech-inclined person, send him through a training course, and then get him a decent job in IT. Not so much these days, and its not because of the ailing economy. Here are some other reasons:
- Certifications are Vendor-centric: they should instead be solution-centric, or more like a mentorship program
- Certification’s Life Cycle Is Short: significantly shorter than a college degree
- Certifications Are Not Real-World Oriented: they are brain dumps which present technology you may never use
- Certifications Have Been Devalued: some are just high-tech diploma mills.
- No Oversight Body: who gets to say who is certified to train database management? Oracle? Microsoft? Both? Neither?
- Degree vs. Certification vs. Experience: with experience and a degree, why on earth would you need a certification?
- HR People Are Not In Touch with the Real World: and nowhere is this more true than in IT
- Budget Cuts: no more training dollars from big companies, so certification companies are desperate for bodies
- Glut of Certified People: anybody can get one, so everybody does get one
- No One Knows Which Certifications Matter: some are very tough to pass, other have a 100% passage rate
The fundamental problem is that it is unbelievably difficult to determine how good an IT person will be based on a piece of paper.
Folks on The Daily WTF and Joel On Software have discussed endlessly on what is the proper mix of education, experience, and certification... each has benefits... but most employers prefer college degrees to certifications.
However, this raises another question... since all education loses value over time, why would an employer prefer a candidate with a 5-year old college degree, compared to a 6-month old certification?
Probably because people have a general idea of the quality of education that is possible at a well-known college. They can look up the name in any number of schools that rank college programs, and have some level of third-party validation. Also, you never quite know where a new employee's true talents may lie... Most of what I learned for my Computer Science degree is outdated, and not frequently relevant for my job... However, those were not the only courses I took in college. I took dozens of non-computer courses that helped me be meticulous when experimenting, write more clearly, think more abstractly, and visualize complex integrated systems better. These courses helped me develop true skills and talents, as opposed to just filling my head with stale knowledge.
Personally... I feel that a college degree means you can learn, experience means you've made the typical rookie mistakes, and certifications/conference attendance means you're dedicated to continuing your education. Of course, none of these demonstrate that your knowledge/skills/talents will be of any practical use for your employer... so its always a risk.
You've probably heard about the technique of Rick Rolling... its basically the web version of the oh-so-mature "made you look" game. You tell people that a link goes to some interesting info, when if fact the link goes to a YouTube video of Rick Astley singing "Never Gonna Give You Up." It's also lead to the trend of live Rick Rolling, in where you trick somebody to look at the lyrics of the song... like what happened during the 2008 Vice Presidential Debates.
Well, now people are so suspicious of YouTube links, they won't click on them anymore. So the answer is to raise the bar a little. My technique is to use open redirects from legitimate websites to hide links to YouTube!
For example... see the link below to Yelp.com? Where do you think it goes? Cut and paste it into a browser URL to see where it actually goes:
It looks like a link to Yelp.com, which is a restaurant review site... but with a little URL magic, you can force Yelp to annoy people. Naturally, once Yelp catches wind of this, they will shut down the open redirect pretty fast, so you have to keep looking for more. The technique is pretty simple:
- Find a large/important site that links frequently to small/unimportant sites... such sites usually have open redirects.
- Poke around and see if you can spot any URLs that look like they might be redirects... the URLs might have parameters like url=http://example.com, redirect=example.com, or something similar.
- Copy one of these redirect URLs into your address bar
- In the site URL, replace the redirect URL parameter with a Rick Rolling URL -- such as http://www.youtube.com/watch?v=Yu_moia-oVI -- and see if the site redirects to YouTube.
- For advanced Rick-Rolling, you might want to disguise the link to YouTube by URL encoding it. Use the form below to obfuscate a URL parameter:
You may now Rick-Roll with impunity...
Why do these open redirects exist? Simple: to prevent SPAM blogs. This problem was big on Amazon.com, because at first they allowed people to submit links in comments. However, that meant that folks could link back to SPAM sites from Amazon.com. This is bad enough, but when Google noticed that Amazon linked to a site, its page rank and "relevance" would increase... meaning those awful SPAM sites would have a higher rank in Google search results. There were many proposals to combat this problem... but the only one that completely solves it is to do a redirect from Amazon.com itself.
This does help the battle against SPAM, but unless you do it right its a major security hole... people would see a link that goes to Amazon.com, then click on it, but then get hijacked to an evil site. The URLs look completely legit, and they bypass most SPAM/SCAM filters. These are particularly useful for people who use the phishing technique to steal bank account numbers, credit card numbers, and the like. Back in 2006 I found these security holes on Google, Amazon, MSN, and AOL. I alerted them all to the bug; some of them fixed it... however more sites every day make this same error. I'm hoping that broadcasting this technique to Rick Rollers might do some good... that way, Rick Rollers will find these security holes on new sites before hackers, cracker, and phishers do.
Basically, I'm betting that the annoying outnumber the evil... Let's hope I'm right...
Back at Oracle Open World 2008, Oracle gave some lip service to how they would get into cloud computing... in case you are not familiar with the term, "cloud computing" is a way of designing your systems so that your data resources (and sometimes your services) behave as if they are "in the internet cloud." Its a combination of a service-oriented architecture, software-as-a-service, and storage-as-a-service. Developers love it, but system administrators are still a bit weary...
Basically, you rent the computational power and storage you need, and only pay for what you use. In theory you can rely on your provider -- such as Google or Amazon.com -- to take care of backups for you. Its a great idea for startups (Twitter does it) and mid-sized companies, so they can keep costs down, while still leaving room to grow. For large companies with their own dedicated data centers, cloud computing makes less sense for production software... but its usually a great idea for development and testing.
Anyway... I was curious how Oracle's "Cloud" strategy would develop... and I was pleasantly surprised to find some recent collaboration between Amazon and Oracle. They put together some Best Practices for Oracle In The Cloud, which I found on Justin Kestylyn's blog:
I really like the idea of encrypting database backups, and storing them in the cloud. That's an excellent idea, for pretty much anybody... and it is supported back to Oracle Database 9i. Check out the Cloud Backup Whitepaper for more info...
I also really see the value for using the Amazon cloud for the persistence layer for archives. The Oracle Universal Online Archive could be a real killer app, but proving its value will need about a Terabyte of storage, just to do a proof-of-concept. Unfortunately, that's not exactly something you can run on a VM Ware virtual machine... but you could do it as an Amazon Machine Image (AMI).
I wouldn't be surprised if we saw more and more archiving solutions that use Amazon's Cloud for persistence...
I had expected that it would take another 3 weeks to release this, but my second book is now available for purchase! As promised, this is more of a business strategy book, and less of a technical book... however, Andy and I did sneak in some good implementation details along the way. We designed this book so every member of your ECM team should get something useful out of it.
The purpose of the book is to present what we call a "pragmatic strategy for content management." For multiple reasons -- both political and technical -- it is rarely feasible for all of your content management products to be from one vendor. Perhaps you just merged with another company and you each have different vendors; perhaps you need blogs and wikis now and cannot wait for your ECM vendor to create a decent offering; perhaps SharePoint has grown like a fungus in your enterprise, and now you need some way to manage the insanity.
Some say the solution is rationalization: consolidate all content into one system... but that's not the whole story. You don't want to wind up like those poor saps running Lotus Notes, do you? Your users will rebel if you take away their nice collaboration tools, or if you tell them they can't have new ones. Entire departments will collapse if you eliminate content silos without any concern for users' productivity.
Instead, the pragmatic approach is to do the following:
- Consolidate content when possible into a "strategic ecm infrastructure." This can -- if desired -- be the single repository that satisfies all your content management needs; however this is not a requirement.
- Federate content services to tactical and legacy applications. This means managing content in other repositories with a combination of enterprise search, universal records management, and enterprise mashups.
- Secure content wherever it lives. Ironically, in most cases your data is only secure when it is not in use! Once you move it from one system to another, it is at risk. Your information should always be secure, whether it is locked down in a database, or in a USB thumb drive at the bottom of your sock drawer.
The book is 250 pages long... but you don't have to read the whole thing. The chapter breakdown is as follows:
- The State of Information Management: a good grounding in what exactly ECM is all about, and why it is important.
- A Pragmatic ECM Architecture: the steps you need to take in order to realize the value of an ECM initiative.
- Assessing Your Environment: make a big list of what needs to be done, and by whom. Which content should be consolidated, and which is best left where it is?
- Strategic ECM Infrastructure and Middleware: this is the "strategic" part of the puzzle. Consolidate to this system whenever cost-effective, and extend it to your portals and enterprise applications with SOAs, ESBs, or ECM standards (WebDAV, CMIS, etc.).
- Managing Legacy and Non-Strategic Content Stores: all the tools for "tactical" integrations with systems that are not (yet) cost effective to consolidate. Your content management strategy should never punish you for failing to consolidate: the goal is to make content manageable.
- Secure Information Wherever It Lives: tools for making sure content is secure, even when it leaves a secure repository.
- Bringing Structured and Unstructured Strategies Together: your ECM initiative should be a part of a broader information management initiative. This chapter presents tools that helps you bridge this gap.
- ECM and Enterprise 2.0: here we present a (better) definition of Enterprise 2.0, and how ECM fits into the ecosystem. It presents a strategy for Pragmatic Enterprise 2.0, and explains how many Enterprise 2.0 initiatives could fail without a comprehensive strategy.
Chapters 1, 2, and 8 are relevant no matter which vendor you use for Enterprise Content Management. We do mention Oracle numerous times, but you can just BLEEEEEEP over that if you use tools from different vendors.
Chapters 3 through 7 show how to implement a "pragmatic ECM strategy" using Oracle tools. Some of this data may or may not be relevant to non-Oracle customers. In most cases, you should find it helpful to see what is possible, so you can determine the distance between where you are now, and where you want to be tomorrow.
I worked pretty hard on this, and I'm relatively pleased with the results... but I'm sure the haters out there will find something to complain about ;-)
I usually like to give verbose book reviews... but I realized that I've fallen more than a little behind lately. Writing my second book sucked up a lot of my free time, but I was still able to squeeze in about one non-fiction book per month... not to mention the hundreds of hours of podcasts on ancient history (my current obsession).
I decided to avoid books on programming and technology this year, and focus mainly on business and communication. I think it was a good idea: partly because I get the best software news from blogs, partly because of the utter lack of software innovation in 2007 and 2008, but mostly because I felt the need to read more about economics and management. If more software geeks did the same, I think the world would run a lot more smoothly...
Anyway... below are the books I read in 2008 that I felt worthy of a review on Amazon and my blog. I hope you find them useful:
Alexander Hamilton by Ron Chernow -- I read this because I had a long standing white-hot hatred of Hamilton, so much so that it deeply amused my friends. Sam White suggested I read this book to get a different perspective. After a few years, I finally did, and it really did turn me around a bit. I now have a lot more respect for Hamilton, and can see through the obvious propaganda that was set against him... Hamilton is still a political fool, but he's a military and financial genius, and the USA would be much worse off without him. And Aaron Burr was a tool.
Getting To Yes by Fisher, Ury, and Patton -- Highly recommended! This is a great book about negotiation, both in theory and practice. It demonstrates how there are three general kinds of negotiators: soft, hard, and principled. Its the latter category that will always be able to find a solution that satisfies both parties, without either party feeling like they gave in. I've used this concept multiple times recently -- sometimes with more success than others. It will always remain a useful tool to help me find the win-win situation in every conflict.
The Influencer by Patterson, Grenny, Maxfield, McMillan, and Switzler -- the follow-up to the book "Crucial Conversations," this book gives some pretty practical advice on how to set up systems that promote positive change. This is a combination of individuals, social groups, and the environment itself... all 3 areas need systems that encourage both the ability and the motivation for positive change; otherwise it will not last. In each area, there are multiple tools that can help, but a true "Influencer" will know what tools to use and when. Highly recommended for anybody who wants to make lasting change.
Speak Peace in a World of Conflict by Marshall Rosenberg -- This is a good grounding in the principles of Non-Violent Communication. It shows some basic techniques for how to communicate in a language of needs, rather than in a language of good/evil/right/wrong. It has more real-world examples for folks, which makes it more accessible to skeptics, and first-timers. If you like it, I would also recommend Non-Violent Communication, Getting To Yes, and Crucial Conversations.
Three Cups of Tea by Mortenson and Relin -- This was a fun read... its a real-world story about a man who failed to climb mount everest, and wound up lost in a remote area of Pakistan. The people there were so kind to him, he promised to return to build a school. After multiple setbacks -- and some hard lessons about life in this region -- Mortenson now runs the Central Asia Institute, and has built nearly 80 schools in the region. He gives an interesting perspective into the instability of the region, including the Taliban and the real causes of 9/11.
The Turnaround Kid by Steve Miller -- A fairly timely book for anybody curious about the US automotive industry. I've always been fascinated by turnaround CEOs: people who relish taking a failing company, and making it profitable again. Steve Miller was one of my heros there, because he engineered the turnaround of about a dozen companies... most recently Delphi. In case you didn't know, Miller was the real brains behind Lee Iacocca's turnaround of Chrysler in the 1980s. He has quite a few words of advice for US manufacturers, which you might want to heed before you need his help!
The Warren Buffett Way by Robert Hagstrom -- Forget it. You will never be Warren Buffet. Accept it. Don't invest your money in the stock market: invest in your business, or yourself. Even if the stock market is your business, you're probably not going to pick stocks better than a computer. Nevertheless, if you want to know how Warren Buffet made his billions, this is a good primer. The book also constantly reminds you to not get carried away: put your money in a S&P index fund, and get back to work. Stock speculation is only profitable for insiders with nearly illegal insider information, or people who work amazingly hard at it every day (like Buffett).
Founding Brothers by Joseph Ellis -- I liked this book... its a short book, geared for both US history buffs, and the general public. It was a good overview of six important moments in US history: the Hamilton/Burr duel, the Hamilton/Jefferson/Madison dinner about debt assumption and the creation of Washington DC, the early arguments about the slave trade, Washington's retirement after a mere 2 terms, the early Adams and Jefferson presidencies, and the later friendship between Adams and Jefferson. I'm not positive it deserved the Pulitzer Prize, but it was certainly one of the better history books I've read.
E-Myth Mastery by Michael Gerber -- the latest in the E-Myth series. This book helps entrepreneurs create systems that allow their company to run, so that they can free-up their time to build and grow the company. As a computer geek who has observed highly ineffectual business process, I was skeptical that this book could teach me anything. I was pleasantly surprised... its a bit big, and I wouldn't recommend it unless you are actually running a business -- or a part of a business -- but it certainly opened my eyes to the value of a culture of entrepreneurialism. It does suffer from a fairly tedious writing style, and perhaps others in the E-Myth series would be a better fit -- such as the E-Myth Revisited -- but it opened my eyes a bit so I'll give it a solid 3 stars.
The Undercover Economist by Tim Harford -- decent coverage of scarcity theory, and slight coverage of comparative advantage, but not much ground-braking information here. Its not as good as Freakonomics, which I also disliked. I'm still looking for a book on classical economic theory that I can tolerate... any suggestions are highly welcome!
Blink by Malcom Gladwell -- this follow-up to The Tipping Point was a bit of a disappointment. There was a lot of good data in it, but I felt that his entire thesis was flawed. Its all about "thinking without thinking," by trusting your "gut." Yeah, that always works out... there was some good data about folks who could read emotions by observing facial muscles, and how the mind operates when under stress, but otherwise it wasn't very thoughtful. Worth reading if you take it with a grain of salt.
Well... this is pretty negative...
CMS Watch came out with their 12 predictions for 2009, and number seven was "Oracle will fall behind in the battle for knowledge workers." Here's the relevant quote:
At one level, Oracle had a banner year in 2008: completing or consolidating numerous large acquisitions that bring in heavy streams of ever-beloved maintenance revenues. But 2009 will expose Oracle's weakness with front-office applications at a time when Microsoft, IBM, and many smaller players are fighting for the hearts and minds of knowledge workers.
Customers are already feeling indigestion, as different Oracle teams market overlapping and often incomplete solutions. For example, Oracle is struggling to combine four different enterprise portal offerings, and many customers are chafing at the financial and architectural challenges of aligning with the putative winner, Oracle WebCenter Suite (OWS). Similarly, collaboration and social software services remain divided between OWS and the new Beehive offering -- a bad situation made worse by the fact that both are really development platforms and not finished toolsets. Meanwhile, longtime Stellent UCM customers complain that Oracle is moving away from the product's Web CMS roots to emphasize heavy-duty document and records management.
First, the acquisition of BEA did really shake up Oracle's whole knowledge management / collaboration / Enterprise 2.0 strategy... and yes, there is considerable overlap in the product offerings. However, ultimately this will be a good thing, because only the best of the best will become strategic products under the "WebCenter" brand. This will take time to digest... it may or may not be "all better" by the 11g release in 2009, but I remain optimistic based on the previews I've seen... so the architecture will likely become much more simplified.
Although, I do have to agree that a lot of Oracle's offerings here are platforms, instead of complete applications -- Stellent/ECM being one exception. The WebCenter platform will never be huge, unless it has pre-packaged "Killer Apps" built on it. This is a general fact about all platforms, and is very much true here as well. There are several in the works -- collectively called "Fusion Applications" -- but I have no clue when they will be released.
Second, regarding the financial challenges, I guess I don't know what he means here... the current WebCenter bundle is a bit pricey, mainly because it's a bundle of so many different tools. Remember, WebCenter is a brand, and not just a single piece of technology. Oracle will probably figure out smaller, cheaper bundles that sell better, so I don't see this as that much of a long term problem. Maybe some folks are upset about the price of migration from older platforms to WebCenter... but nobody is forcing them to upgrade. They'll have to do a technology refresh at some point, and Oracle will continue to support and make new released of their non-strategic product lines... so I guess I'll need to hear more before I can respond.
Third, regarding existing Stellent UCM customers, Oracle is actually moving in both WCM and document/records management at the same time. The heavy-duty document and records management offerings are badly needed by many of their existing enterprise customers, so there's a lot of sales opportunity by productizing a few enterprise-level integrations. While at the same time, they spent a lot of time and energy in the next version of Site Studio (Web Content Management) including their Open WCM initiative... This will be big in 2009.
The Stellent faithful have been hearing this line for a long time, but their patience will be rewarded as soon as January.
For those who watched the December 10 customer call, you'd know that you will be able to play with this next-generation of Site Studio relatively soon. A lot of it will be released as Site Studio 10gr4 at the beginning of 2009. The rest will be released in 11g, which is slated for some time in 2009. Alan Baer will be doing a Deep Dive into Oracle Site Studio 10gr4 in January, if you want to know more.
And finally, we should note that of the dozen 2008 predictions by CMS Watch, they claim seven came true, three did not, and two are in the "maybe" pile... so take this prediction with a grain of salt. Oracle has several decent ECM products due out in 2009... so this warning could be both a wake-up call, and a self-denying prophesy.
I was just watching Ford's CEO Alan Mulally on CNN... Ford is actually doing fairly well, and doesn't need much of the bailout money, so a lot of people were confused about why he would stick up for GM or Chrysler went bankrupt. At first glance, you think it would be great if your competition went bankrupt, because then you could gobble up their market share... but Ford was actually very concerned.
Initially, I suspected something of an old-boys-network thing. Mulally is sticking up for other Detroit car companies, simply because they need to stick together if one of them needs to go to Washington to ask for help against Japanese or German car companies... so it might be just cynical, political self interest.
Mulally's explanation was oddly different...
He stated that the majority of the auto industry is in the suppliers, not the auto makers. Since all auto companies use the same suppliers, and suppliers are hurting as well, then one bad company puts the whole thing at risk.
For example, if GM goes bankrupt, then Delphi might go bankrupt, and not be able to supply parts to Toyota, Ford, or Volkswagen. That puts them all at risk if a company as large as GM goes bankrupt.
What shocked me was that during the interview, Mulally called his own company an "original equipment manufacturer!" This is a common term in both software and manufacturing, usually shortened to just OEM. It basically means Ford doesn't manufacture anything; it wraps pre-manufactured products with its own brand. They don't make engines, doors, wheels, brakes, transmissions, or pretty much anything anymore... they just slap together other people's stuff, put the word "Ford" on it, then sell it through their distribution channels.
I was wondering how long it would take them to admit this... and how folks would react... The CNN guy just brushed it off as "auto company speak," so I don't think they actually understood what Mulally meant.
Cringely brought this up a few weeks ago in his article What if Steve Jobs ran one of the Big Three auto companies? He suggested the same thing... Car companies should act more like Apple: let other companies do the dirty work of creating the "parts," then focus the big 3 on design, sales, marketing, and customer services. The whole article is very good, I recommend reading it.
Hearing Mulally openly admit that Ford is nothing but an OEM is very telling... and it gives me hope that some folks in Detroit "get it," and might actually be able to turn around the industry... but it might take a while longer for the folks at CNN to "get it."
There have been millions of technological innovations since cave men first invented the wheel... many of them -- such as the printing press, the sewing machine, and the robot -- have put people out of a job. However, it is completely illogical to state that technology eliminates jobs. If that were true, then 10,000 years of innovation would mean no jobs left on the planet... The relationship between technology and jobs is much more complex than that.
Put simply, innovations may be disruptive, but they can never replace a human who actually gives a damn. This may be difficult to believe -- especially if you recently lost your job because a robot/computer could do it faster... but innovations don't fire people; managers fire people... and both labor and management use technology as a scapegoat.
Here's my theory on how this all works:
- For better or worse, the majority of people are motivated by economic means. Not entirely, mind you, but significantly... and everybody would prefer to have more money if possible.
- The primary thing that keeps an economic system growing and creating new wealth is increased worker productivity.
- Technological innovations make workers more efficient.
- This means a short-sighted employer can purchase new technology, lay off workers, and maintain existing production levels... however, this trick is easy for the competition to replicate, so its a terrible long-term solution.
- Alternatively, workers could learn how to work with new technology, and become phenomenally more productive than just technology alone. This is difficult for the competition to replicate, because it relies on a culture of training, sharing knowledge, and institutional learning... so its a great long-term solution.
- Therefore, employers who use new innovations plus retrained labor will always be more competitive, and the first to find and cultivate new markets.
- When this happens, overall worker productivity increases, and more wealth is created for everybody: investors, innovators, managers, and workers.
Scribes lost their jobs when the printing press was invented... but cheap books created huge demand for new kinds of books, and the printing industry boomed. Tailors lost their jobs when the sewing machine was invented... but cheap clothes created huge demand for new fashions, and the clothing industry boomed. Naturally, this doesn't always work for low skilled workers, and all this amoral capitalism is painful for people who lose their job... so a smart government would provide its citizens with temporary unemployment pay, education, and jobs programs to help them through the disruptive phase. But, that's a blog post for a different web site ;-)
This same rule applies to knowledge workers... don't think of them being "replaced" with software, think of them being "empowered" by software.
I am personally highly skeptical about "Enterprise 2.0" software that claims to help people effortlessly find content, seamlessly connect with people, and make effective business decisions as a "crowd". That's not to say these tools have no value... but they are no replacement for people who know what they are doing, and have a desire to get better at it.
Neither Wikipedia nor Google can replace people who intuitively understand a subject, and can weed out "false" information from the mountain of badly written presentations, reports, and blogs... Neither LinkedIn nor Facebook can replace the people who genuinely love connecting with thousands of friends, staying in touch, and helping people out... And nothing, nothing can replace a manager with leadership and consensus building skills. All these people have a genuine talent for discovering useful information, connecting people to each other, and managing a group.
If you have talented employees, you can never replace them. If you don't have them, then software is a stop-gap solution; not a substitute. Technology can only raise the bar a little... ordinary folks will use technology to become slightly better than average at a task... but those with talent can use the exact same technology, and leave everybody else in the dust.
Some hype their business (agenda #4: gold digger), some hype their personal website (agenda #1: blog vomit), some hype how awesome they are each time they have an expensive glass of wine (agenda #2: sucks to be you).
Most just dwell upon the mind-numbing minutia of everyday life and hope somebody is listening... usually these folks don't quite seem to understand that you should not tweet what you are doing; tweet what has your attention. Do that, then the odds are much higher that somebody else will also be interested.
Following this advice, I set my agenda a long time ago to be agenda #5: Rodney Dangerfield. This means that you don't talk about yourself much, you only send random one-liners out into the world. I typically tweet insightful quotations that I just finished reading... Its really hard to squeeze some of these bad boys into 140 letters, and provide a reference... but its a good exercise in editing skills.
For example, recently I saw a great quote by Fred Brooks, but it was waaaaaaaaaaay to long for twitter... so I shrunk it down and tweeted this:
"Much of the essence of software development is debugging the specification." -- (shorter) Fred Brooks
Zing! It captured 100% of what he said, within Twitter approved guidelines. I'm trying to do the same for other notable quotes, but it can sometimes get tricky. I frequently have to rewrite the entire phrase to fit into Twitter... which raises the question as to who should get attribution for the quote.
So, what's your Twitter agenda?
Not in my cubicle, dude.
Better luck next time.
(Hat Tip: Infonomics)