Articles specific to Oracle software products, including the former Stellent product line

Email Archiving Solutions: Illegal Without Employee Consent!

CMSWatch just reported on a recent ruling by the Ninth US Circuit Court of Appeals in San Francisco... essentially, they said that when an employer scanned their archives of old emails, snooping to see if their employees were doing anything naughty, that the company violated their 4th amendment protections against unreasonable search and seizure.

The ruling makes total sense... CMS Watch thinks the implications could be huge, but I disagree. Frankly, if this ruling stands it only means that employers will now make people sign a form that says this:

I consent to have my email archives scanned by my employer, in order to validate that I adhere to their email policy.

Done and done. If the prospective (or current) employee refuses to sign, then they either have something to hide, or they love liberty, dammit! Its just one more bogus piece of paper that everybody will sign before they can be hired. In previous jobs I had to sign something stating that I promise to abide by their network usage policy... this is just an addendum that says "and we're watching you, sucka!"

Whether that contract will stand up in court is matter for debate... especially if you use the word "sucka."

Search Engine Optimization for the Enterprise

Michelle sent me an interesting article: Enterprise Search Just not there Yet. She and I both agreed... it is a terrible analysis.

To sum up, a lot of people are complaining that their enterprise search appliances aren't working right. Why doesn't it work like Google? they all say... I can always find relevant information on Google! Well, I got news for you:

People get paid LOTS of money to make sure Google can find their content.

People have meetings about getting high rankings... they hone their content. They obsess about keywords, and making sure the content is written in a readable format. They obsess about URLs, and make sure other pages link to this content. They register with a bunch of indexes, catalogs, and online yellow pages to boost relevance. They set up whole web sites for specific topics, neatly organized with clear, browsable topics. They hire very expensive specialists in SEO, and information architecture. In others words, internet content creators actually care if people find their content!

Google has it easy...

In contrast, how many enterprise employees obsess about findability, browseability, proper language, or keywords? How much of their content is even intended for an outside audience? How many of them even bother to enhance their content with useful metadata like title or comment? How many of them actively promote their content, and ask people to link to it?

Pretty nearly zero...

Without this effort, not even Google's laudable algorithm can find useful content in the enterprise... as is evidence by the general disappointment of the Google search appliance. No auto categorization engine can save you. No search engine will rescue you. No matter what people would like to believe, no software can ever replace a human being who actually gives a damn.

So... how do we fix the enterprise findability problem? It won't happen until people start caring both about them being able to find others' content, and others being able to find their content. I suggest you take advantage of the natural competitive nature inside humans... cash incentives might backfire, but nothing motivates people more than "your hit count is below average."

Start publicly ranking people on how findable their content is, and I guarantee that things will improve.

Social Networking Site for ECM

AIIM sent me an email about their new social networking site for ECM folks, named Information Zen. Its built on top of Ning, like a lot of other community sites I belong to. Mancini is on there, and its probably only a matter of hours before Billy is up there too.

I like it a lot more than the standard AIIM site... I hope they move more of their content over. They have videos, groups, and forums, all broken down by ECM aspect: records management, enterprise search, content management, eDiscovery, etc.

Should be a good place to get community help with strategic ECM questions... it also might be good for unbiased information about ECM vendors: how tough is it to set up, deploy, maintain, customize, etc.

Seems to be growing fast... I joined, then I wrote this blog post, and in that time they got 6 new members! Over 600 members in a few hours... not bad, AIIM!

Yarp. Larry Elison is Iron Man

I first heard that Larry Ellison was the inspiration for Iron Man from fake Steve... apparently Robert Downey Jr. studied video tapes of Larry in order to develop his billionaire persona... complete with goatee, mussed hair, Jesus hands, and everything! Skeptical? You can view video evidence yourself.

Well... it now seems that Oracle is getting in on this reality blur as well...

In my mailbox today, I got the Oracle partner newsletter about a cross-promotional campaign with Marvel. They are promoting the new Marvel Trilogy, starring Iron Man. The tagline is Hardware by Marvel, Software by Oracle.

Since Marvel did the graphics, the advert looks pretty nifty. Its a nice deviation from the standard Oracle marketing material: red, white, and boring... but this is just gonna make conspiracy nuts suspicious.

So what do you think Larry really does in his spare time?

I wouldn't be surprised if he had his own flying suit... but I'd be pretty shocked if it turned out he used it to battle warlords in Afghanistan...

Reminder: Oracle ECM Webcast Tomorrow

Oracle is now is doing a quarterly customer webcast to keep folks up to date about the latest changes in the product line. The next one will be June 5, 2008 at 9:00 a.m Pacific Time. If you'd like to attend, you need to register with Intercall:

Its for customers and partners only... so be sure to use your company email address... you also might want to read more about getting Stellent ECM announcements...

Look Out Oracle: MapReduce Might Be Able To Do JOINs

Apologies for the esoteric post, folks... but this is kind of important... Two folks from Yahoo, plus two folks from UCLA, have just released a paper on the ACM about a new kind of parallel algorithm: Map - Reduce - Merge.

If you don't know about MapReduce, its the algorithm that makes most of Google possible. Its a simple algorithm that allows you to break a complex problem into hundreds of smaller problems, use hundreds of computers to solve it, then stitch the complete solution back together. Google says its excellent for:

"distributed grep, distributed sort, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, statistical machine translation..."

bla bla bla... but MapReduce can't do joins between relational data sets. In other words, its great for making a search engine, but woefully impractical for virtually every business application known to man... although some MapReduce-based databases are trying anyway (CouchDB, Hadoop, etc.)

UPDATE: Some Hadoop fans mentioned in the comments that MapReduce can do joins in the Map step or the Reduce step... but its highly restrictive on the Map step, and sometimes slow in the Reduce step... joins are possible, but sometimes impractical.

Well... this latest twist from the Yahoo folks fixed that: they claim MapReduceMerge now supports table JOINs. No proof as of yet, but there are a lot of folks staking their reputation on this... so its a fair bet. The Hadoop folks seem to be experimenting with MapReduceMerge... so if they spit out some new insanely fast benchmarks, my guess is that this is for real...

What does this mean for relational database like Oracle? Uncertain... but I did hear a juicy rumor about 15 months back: some guy from Yahoo sat down in a room with Oracle's math PHDs, and spent a day discussing an algorithm for super-fast multidimensional table joins... like sub-second performance on 14-table relational queries, with no upper limit. My sources told me the Oracle dudes were floored, and started making immediate plans to integrate some new stuff into their database. The Yahoo connection made me think this might be the MapReduceMerge concept...

Coincidence? Perhaps... but a juicy rumor nonetheless.

Drama On The Oracle Wiki

Well, this is unfortunate... CMS watch is reporting a rumor about an Oracle Wiki incident. An Oracle partner named Sten Vesterli posted some less than positive feedback about WebCenter on the Oracle Wiki... was promptly flamed by an Oracle product manager, then had his postings removed:

I placed some of the description and the pro/con discussion from my upcoming paper comparing Oracle development tools on the Oracle Wiki. And just like when I posted something not unambiguously positive about Oracle WebCenter on the Wiki, I was immediately flamed by an Oracle product manager, and any trace of negativity edited out of one of my pages.

Oops... looks like a Web 2.0 malfunction.

Firstly, CMS Watch sees this as something of a cultural problem. Microsoft's wiki is locked down tight: all submissions need prior approval. How quaint, they downgraded Web 2.0! In contrast, Oracle is much more open... which means Oracle probably has a policy for dealing with criticism. They need to react to posts from the occasional partner who flames them with product criticism that might help Oracle competitors. However, since the BEA acquisition, Oracle now owns 4 portals products! I'd argue the only real competition to WebCenter is inside Oracle, which may explain why passions are so high... Still, it would be good if Oracle changed their wiki terms of use to mention that they have reasonable editorial control if posts give away competitive information.

Second, Sten Vesterli is an Oracle ACE Director, like me. That means we have multiple channels for criticism if we don't like the feature set of the product. We're expected to extend Oracle some level of professional courtesy when we give criticism. I occasionally point out the flaws in Oracle products, but I almost always offer a workaround, and I don't put them on places as high profile as the Oracle Wiki... Naturally, some folks at Oracle would feel Sten was being a tad rude...

But ultimately, a wiki is the wrong place for criticism. Criticism almost always contains judgment, which by definition violates the neutral point of view policy that is on all wikis -- even Wikipedia. As Justin Kestelyn says:

A wiki is not the place for opinion, because opinion does not invite editing, only response.

The wiki was probably the wrong forum for Sten. Want to rant about WebCenter? Then your text belongs on a blog. Oracle's policy should simply be that: criticism belongs on your blog, not on our wiki, or any wiki. Then they should monitor pages that are "hot topics," and delete anything that looks like a rant. Clean and simple.

Hopefully Oracle doesn't try to lock down access to the wiki because of this drama...

UPDATE: Justin got in touch with Sten to figure out what really happened, it didn't seem to involve WebCenter, and CMS Watch blew it all out of proportion... The wiki is thankfully back to business as usual.

High Stakes For EMC / Documentum

CMS Watch has some interesting reflections on EMC World and Documentum... apparently, EMC still has decent Enterprise Content Management products, but there's a real lack of enthusiasm in EMC about the whole thing:

Under the covers there remains some good technology and some good technologists, but there just doesn't seem to be the enthusiasm in the rest of EMC to really get behind it. One way of classifying these two groups is that they consist of the remnant Documentum products (built and acquired) over the years. We see many elements of the collaborative DM that Documentum majored on in the past in today's Knowledge Worker division, alongside the updated eRoom offering. In the Interactive Media group we see the old Bulldog DAM products given a fresh coat of paint. Both looked fine in the demo, but in talking to broader EMC sales staff, there was little interest or knowledge of these areas.

The CMS Watch article is also an interesting intro to Content Management and Archiving (CMA)... which seems to be the path that a lot of Enterprise Content Management vendors seem to be taking. Oracle's plan to achieve CMA is with a nice blend of Stellent and their Universal Online Archive... I'll go into more depth in my next book ;-)

As Billy noted with some statistics, archiving is a big deal for a complete ECM solution... It seems like some folks at Documentum "get it," but the jury is out whether the EMC folks will listen...

Empathy vs Sympathy

A few weeks ago I gave a talk about Communication For Geeks at the Minneapolis MinneBar conference. I strongly believe that the majority of software failures are communication failures, and if geeks want to be a part of fun, successful projects, they had damn well better learn how to communicate... because most managers clearly can't.

It was a surprisingly popular talk: I had twice as many attendees as I had handouts...

Anyway, on Friday, I got an interesting call from one of the attendees, Kelly Coleman. He was excited to tell me about a situation where he used one of my tips to better communicate with one of his friends... Kelly took to heart one of the most important lessons geeks need to learn: use empathy before education! I was really happy to hear about it, so I though I'd repeat the lesson here in case others might benefit:

Empathy is not Sympathy!

A lot of people confuse empathy and sympathy... I do it myself a lot. Sympathy is feeling what somebody else feels through you. When you are being sympathetic, you're not really helping much, because you're making the situation about you... In contrast, empathy is feeling what somebody else feels through them. You keep the focus on them, until you're certain they've expressed themselves fully.

To illustrate, the following would be sympathy:

Bob: I just got fired...
Joe: Wow, that sucks... but don't worry, you'll be fine! I got fired a few years back,
     and there's always work available for talented guys like us, right?

Joe genuinely thinks he is being helpful... Joe is not being helpful! Joe isn't listening to Bob at all. Joe is rambling on about his own past, and about his theories of the job market. He's trying to connect with Bob, but he's using sympathy. Sympathy is dangerous, because it leaves Joe open for this:

Bob: What the hell do you know, Joe? That was years ago! You didn't have 
     a house! You didn't have a wife and a kid to support! The job market was completely
     different back then! You have no clue about my problems! Get the hell away from me!
Joe: ...I was only trying to help...

Bob is clearly in a lot of pain. He's afraid of a lot of things, and his good buddy Joe clearly isn't listening. So Bob lashes out, and wisely tells Joe to get the hell away from him. Then Joe gets defensive, and says something even stupider. With luck, they'll be friends again in a few weeks... but you never know.

In contrast, empathy almost always is better... it would look something like this:

Bob: I just got fired...
Joe: Wow, that sucks... you must be feeling pretty scared right now, huh?

Ding! Ding! Ding! Ding! Give Joe a cookie!

See the difference? Joe didn't make it about himself... he kept his focus on Bob. He asked Bob how he was feeling, and after Bob answers, Joe should keep asking. He should let Bob vent about his situation: his wife, his kid, his house, the job market, whatever. Even if Joe knows a guy who might give Bob a job, Joe should shut the hell up until Bob's finished venting. This may only take five minutes, or it might take a whole hour. Either way, its an important part of the process. Bob will not listen to what Joe has to say, unless Bob feels Joe fully understands his situation.

Empathy before education. Always.

How does Joe know when Bob's finished venting? He'll hear something different in Bob's voice: hope. When Bob is open for suggestions, he'll say something like, "what do you think I should do?" or "have you ever been in this situation before?" Only after Joe hears this, is Bob ready to listen to new ideas, new possibilities, and new ways of fixing this problem. Only after Joe hears hope, or a direct request for help, is Bob ready to hear what Joe wants to say. If Joe wants to help Bob, Joe needs patience.

Now... empathy is not easy, and its extraordinarily difficult for engineers.

Most technical people have been brainwashed by years of "education" into believing that there's a "right way" to do everything, and that its our job to fix it. When something is "wrong," we want to dive in and tell everybody how to make it "right" again. Its a trained compulsion. This is why engineers make lousy lovers, but excellent terrorists. In both cases, its a lack of empathy that dooms us to this fantasy world of absolute right and wrong, making it impossible to see things from another perspective.

Sound like anybody you know?

As such, it will be difficult for software engineers to learn empathy... but they needs lots of practice before they can move on to even more advanced forms of communication... which I'll be talking about on a later date ;-)

Should We Always Use Empathy?

That's a tricky question... empathy takes a lot of time, and sometimes you don't have the luxury. However, it is important to understand what empathy is, so when people "fall off the wagon" you won't take it personally...

For example, blogger James McGovern decided to practice empathy which got some props from Billy... however, James' blogging style does not lend itself well to empathy... He's snarky, and enjoys to inciting fights, so he can better understand who has a better position. If everything were puppies and rainbows, I'd probably stop reading his blog. No surprise that James then went back to his old self after about 23 hours...

And lets not forget Broc Samson's mystic journey in Venture Brothers... where in a dream he learns the value of empathy and feels great... but then is confronted by his former special ops trainer:

Broc Samson: What about uhhh, humanity and empathy and all that garbage?
Hunter:      You're a tool, boy, a tool! Built for a single purpose by the United States
             who shut your third god damned eye for a good f$%&ing reason! You can't
             teach a hammer to love nails, son. That dog won't hunt!

Yep... an army of empaths sure would be cool... but in the meantime, we live in a world of conflict... so until everybody understands the power of empathy, its probably best to know multiple ways to deal with conflict. In order, I prefer empathic communication, principled negotiation, then Broc Samson.

In the meantime... practice giving and getting empathy. Its far more powerful than you realize.

Free Enterprise 2.0 Training

AIIM is offering free Enterprise 2.0 training, at least for a short while. They have a ten-part course on Enterprise 2.0, and AIIM is offer one of the ten sessions for free...

A lot of folks are confused, and justly ask what the heck is Enterprise 2.0 anyway? Jake opined a while ago that the Enterprise 2.0 label might be a little unnecessary... because its all pretty much the Web 2.0 stuff anyway. I agree in part... a lot of the initial buzz about Enterprise 2.0 is pretty much just making blogs, wikis, and social software more "enterprisey."

However, its also about streamlining business process management, data mining, and data visualization with freaky new tools... and a lot of cool new security offerings. Personally, I think that solving Enterprise 2.0 security problems is easier than solving Web 2.0 security issues... because cross-domain single sign on is a lot easier to do in the enterprise... and there's less spam ;-)

I'd also like to emphasize that real Enterprise 2.0 shouldn't be focused on the latest buzzwords... it should be about empowerment, simplicity, and evolution. Bill Gates recently reminded us of the dirty little secret of software: tradition enterprise apps are more about tracking and monitoring employees, rather than empowering them to do their jobs better. Enterprise 2.0, if it takes the lead from Web 2.0, should break that command-and-control mold to enable bigger, better, faster innovation. Otherwise, it shouldn't even be called Enterprise 2.0.

To paraphrase Clay Shirky, innovations aren't socially interesting until they are technically boring. Frankly, I'd argue that enterprise applications could certainly benefit from some technical boredom... instead of re-architecting your solution every 5 years with the latest and greatest ivory tower buzzwords -- J2EE, EJB, Portals, SOA, CEP, ESB, etc. -- just use the simplest things that works.

To sum up... Keep it simple (stupid!), focus on usability, and your audience will love you... Or as Einstein would say:

Things should be made as simple as possible -- but no simpler.

Always good advice...

Inventor Of The Web Now Looking Into Metadata

Tim Berners-Lee, the inventor of the world wide web (shown right, Gore-ified), is complaining about how difficult it is to find what you want on the web. He brings up some points that ECM folks have known about for a long long time: metadata is essential for finding what you want. Timmay has been awarded a $350,000 grant to study the issue... like he needs the cash...

Anyway, I'm all over embedded metadata... but I can tell you right now its nearly impossible to get everybody in a small company to agree to a complex metadata standard, so good luck accomplishing that on the whole frigging internet... also, its even more impossible to get content creators to follow the metadata policies. People who work hard to create great content believe others should work hard to FIND their content.

Silly? Yes... but a very common attitude.

The ones who usually spend a lot of time optimizing their metadata and search results, are usually those with lower quality content, or people who are obsessed with their own popularity. Such as myself... Or, like spam blogs, aka splogs...

That being said, I wish Timmay well... but you don't need $350,000 to come up with a solution for this. I'd recommend creating a Microformat for embedded metadata, to apply to the content in the containing element... You'd also need some kind of cryptographically strong key for "trusted" metadata suppliers, so people don't try to tag porn sites with every metadata keyword in the universe. Also, the microformat needs to be extensible, so anybody can embed custom namespaces and keywords in it.

The Microformats folks have a tag similar to what he wants already, the rel-tag microformat. Its overly simple, but that simplicity will encourage adoption -- much in the same way that the "inferiority" of HTML over all other markup formats ensured its success. It looks like this:

<a href="" rel="tag">tech</a>

If this link is anywhere on the page, it signifies that the page has something to do with the metadata tag tech, and it gives Technorati as the home for the tag. I'd expand on this to allow for other tag homes, such as, or even any arbitrary social bookmarking site.

Done and done.

Naturally, you'd want to be more precise... some new sites have dozens of articles on the main page, and you'd like to tag each section individually... so this tag should only apply to the content in the containing tag. Also, this kind of formatting is difficult to embed in a Microsoft Word document: I couldn't get behind any kind of distributed metadata model unless it easily worked in multiple formats. A CSS style might be a better choice than the rel attribute in the link...

Finally, it needs some kind of authentication model... perhaps you need an authentication microformat somewhere on the page, and all microformats on the page inherit the credentials. Of course, you'd need to have some kind of tag to force exceptions, so banner ads from remote servers would need their own auth tokens.

Whaddya say, Timmay? Is that due-dilligence worth $5k?

Where Are The Dang Stellent Announcements?

If you're a Stellent customer, you've probably noticed that you no longer get those nice customer emails about ECM from Oracle... that's because Oracle -- wisely -- has a pretty strict anti-spam policy... so people who own one Oracle product aren't blasted with info and offers from other products.

So what should you do instead? Well, first of all you need to register for the quarterly content management newsletter. This isn't customer-only focused, but its got a lot of good stuff... and the archives are online.

In addition, Oracle is now is doing a quarterly customer webcast to keep folks up to date about the latest changes in the product line. The next one will be June 5, 2008 at 9:00 a.m Pacific Time. If you'd like to attend, you need to register with Intercall. Its for customers and partners only... so be sure to use your company email address, or you'll get booted!

Another option is to configure Metalink to send you ECM updates. Follow these steps:

  1. Log in to Metalink.
  2. Click the Headlines link in the upper left of the page.
  3. Click the Edit Page button under Headlines on the right.
  4. In the drop-down list, select Certify and Availability, and click Add New.
  5. In the top right, select the Certify Settings button.
  6. In the drop-down list under Product Alerts, select Oracle Universal Product Management, and click Add New.
  7. Click the Overall Settings button.
  8. At the bottom of the page, click the Automatically email My Headlines to me checkbox.
  9. Click Store Settings.
  10. Play the waiting game, or hungry hungry hippos until your emails arrive.

You can also subscribe to bug reports and knowledge base articles, if you'd really like to keep up-to-date...

Of course, to be fully up to date, you'll need to do all this, plus subscribe to email notifications from the Stellent Yahoo Group and the Oracle ECM Forum... but that's only for information hogs like me ;-)

BEA Participate Beats Everybody in Swag vs. Swag

I'm out here at the BEA Participate conference... or more correctly, I'm up in the hotel writing while everybody else is attending the conference... I'm just tagging along so I can chat with Andy about the book, and get the low-down on what the BEA acquisition means for Oracle.

I asked Victoria Lira over at the Oracle ACE Director program if she could get me a free ticket to the event, but there were none available. Even Oracle employees had to pay their own way! Highly unusual, especially considering that Oracle now owns BEA.

I was surprised at first, until Michelle revealed the free iPod Touch that all attendees got! Damn... I haven't seen swag like that since the dot com days...

I guess now that they are Oracle, they suddenly have money to burn...

Does Art Have A Process?

I recently came across the article We Don't Know How We Program. It was a discussion about the gaps between what developers and non-developers think about the process of writing code. It begins:

I was talking to a colleague from another part of the company a couple of weeks ago, and I mentioned the famous ten-to-one productivity variation between the best and worst programmers. He was surprised, so I sketched some graphs and added a few anecdotes. He then proposed a simple solution: "Obviously the programmers at the bottom end are using the wrong process, so send them on a course to teach them the right process." My immediate response, I freely admit, was to open and shut my mouth a couple of times while trying to think of response more diplomatic than "How could anyone be so dumb as to suggest that?"

hehehe... the central premise to the article is that programming is a creative endeavor, which doesn't lend itself well to process... The unfortunate developers subjected to process will only achieve mediocrity... additionally any process that stifles creativity will expunge or crush exceptional programmers, because they need creative space to be ten times as productive.

Does that mean that good programming cannot have a process? Of course not... although as others have noted, things like CMMi should be avoided like the plague. A process needs to be able to empower creativity, but also reign it in when necessary. Programmers -- like artists -- think big, and do wild things that are cool but don't satisfy the needs of the end users. The product doesn't sell, the users rebel, everything goes to hell... the developers know full well of the "failure," so to nurse their bruised egos, they blame the users for being dumb, or the specification for being incomplete. Then they curl up into a ball and call themselves misunderstood.

Yep. Just like Van Gogh.

To reign this in, you need a peer- and customer- driven process to help keep the project down-to-earth... however, done in such a way to not bruise egos or go anywhere near arbitrary rules. The process needs to evolve with the code. You also need something that encourages developers to think of the code as a community project, to reduce a sense of ownership, and thus keep egos intact. Agile focuses a lot on those kinds of processes... although Agile needs some tweaking for very large projects.

In addition, you also need processes that get the creative juices flowing... this doesn't mean brainstorming sessions or hyper expensive collaboration tools. This usually means simple things like physical proximity. Some teams even had great success with an enforced MESSY DESK policy. That's right... clean desks are evil! Messy desks and physical proximity encourage the "drop in, say hi, notice notes strewn about, and comment on them" process... which more than anything else inspires collaboration and fresh ideas.

My gut feeling? Unless you have artists designing your code process, your organization will never create exceptional code.

So keep a close eye on that process weenie with the stopwatch... he's clearly up to no good.

Oracle Finally Acquires BEA

Its even more official... the Oracle purchase of BEA is final.

Most of my thoughts on the subject are in an older post from when Oracle announced their initial offer for BEA.

Its effect on Oracle ECM technology will be minimal... Oracle ECM already integrates quite well with a large number of BEA products, and this doesn't alter the overall ECM strategy much. The Stellent alumni are pleased as punch... Although the price list for Oracle Middleware just got a lot more complex.

Speaking of which, the effects of the BEA purchase on Oracle ECM sales should be very positive... since Oracle sells the best content management app available, and it integrates nicely with lots of BEA goodies, it should be a pretty easy sell to existing BEA customers.

Of course, the devil is in the details... so stay tuned.

UPDATE: Billy Cripe has some info about potential layoffs in Oracle Fusion Middleware. I'd like to link directly to the specific article about layoffs... but when I click on the permalink, it just takes me to Billy's LinkedIn page! Bad Omen?

UPDATE 2: Billy fixed the link...

How Many Hits Does Your Site REALLY Get?

Its been two years since my inaugural blog post on April 29th, 2006: The Trouble With RSS. Over my site's second year, I wanted to do some long-term analysis on how different web analytics tools track hits, visits, and the like. As expected, they don't agree with each other:

  • SiteMeter: 89,800 visits (132,000 hits)
  • Google Analytics: 84,000 visits (140,000 hits)
  • Webalizer: 431,000 visits (3,660,000 hits)

Curious about why web site statistics differ based on the tool? SiteMeter uses an embedded image (at the bottom of this page), and tracks a hit every time somebody loads the image... so if you block banner ads, your visit might not be recorded. Google Analytics loads some JavaScript, which is useful for tracking more complete data... but if your browser blocks JavaScript (or cross-domain JavaScript), it wont register a hit. I found it odd that SiteMeter tracked more visits, but fewer hits than Google Analytics... curious.

In contrast with the other two, Webalizer uses raw Apache logs to determine hit count, so it tracks every single dang hit... Over 3 million hits in one year??? That's clearly too many... I'm not that interesting... but the visit count might be more accurate. Webalizer is the only analytics tool that tracks folks who view my site with RSS Readers, which may hit my site several times per day... thus the higher visit count. The hit count is hyper inflated because it counts search engine spiders, spammers, and hack attempts (some better than others).

All told, if the majority of folks view my site with RSS, then Webalizer's count is more accurate. If most of them view it the old fashioned way, then the other two are more accurate. I'm probably in the 100,000 - 200,000 visits per year range.

Unfortunately, none of these numbers include the folks who read my site through an online RSS readers, like Google Reader, or Bloglines. These sites hit my RSS feed once, then share it with dozens of folks who subscribe to the feed... To get a better estimate, I could pipe my RSS Feed through something like Feedburner. Feedburner keeps track of how many subscribers you have on the online feed readers, and produces decent stats on it... however, once you move your feed to Feedburner, its almost impossible to move it out... so I'm not happy with that option. Even so, that still wouldn't track those who view my content through RSS aggregators like Central Standard Tech, or Orana, or other sites that run Planet.

Well, what about the data from Alexa? That site ranks web pages based on those who surf the web with a toolbar that tracks their every move. Personally, I think people who surf with that toolbar are opening up a major security hole... so their viewing audience is probably restricted to folks who are kind of tech savvy, but don't take security precautions. In other words, newbie geeks. I've never broken into the top 100,000 sites ranked on Alexa, but I frequently break the top 100,000 sites ranked by Technorati... although Technorati only ranks blogs.

UPDATE: As Phil noted in the comments below, most people use Alexa just to boost their own page rank. For example, you could have your web team install and enable the Alexa toolbar, but only when browsing you own web page. That would make your Alexa rank huge without any actual hits from the greater internet...

Even if we could accurately count how many people hit the site, we're still at a loss to know who paid attention. Google Analytics tries to measure "time on the page", other metrics include bounce rate, or even the number of comments.

Oh well... A reliable measure of relevance will always be elusive... but at least we have enough estimates to support a cottage industry of people analyzing those metrics to prove anything they are told to prove ;-).

Back to my anniversary... Lots of stuff has changed since my first anniversary post: I've traveled to South Africa, Brazil, and Argentina... I've remodeled my kitchen, I've nearly completed my second book on Oracle enterprise content management, I've given technology presentations at Oracle Open World, AIIM Minnesota, BarCamp Minnesota, and IOUG Collaborate in Denver. I've trained both salespeople and consultants on what Enterprise Content Management actually is, and I helped negotiate a settlement to an 18-month lawsuit against a local non-profit. Oh yeah... I implemented about a dozen ECM solutions as well...

Next year, I hope to have even more goin' on... and a few more web site visits.

Oracle Universal Online Archive: The "Killer App" for Oracle Secure Files

When I first heard about Oracle taking a new direction with their old content management product -- meaning the old Content DB, not the newly acquired Stellent stuff -- the first thing I thought was it's about time!

When Oracle claimed it had 2 content management systems, that really confused people... especially considering that Content DB was at best a set of tools to create a content management system, whereas Stellent was a full blown application plus framework. They really weren't like each other at all.

Universal Online Archive (UOA) is Content DB, but now focused on being an archiving platform. On Oracle 11g, it is an extension on the Secure Files feature of the database. If you haven't heard of Secure Files yet, it beats the Linux filesystem on both read and write performance. It also has compression, de-duplication (only storing duplicate files once), and encryption. The encryption is an extension of Oracle Transparent Data Encryption, plus support for encrypting entire tablespaces instead of just individual columns. This means support for foreign keys, as well as indexes beyond the basic b-tree stuff...

Compression reduces the storage needs by 33% on average, according to Oracle. If you then use the statistics from IDC that there are 8 copies for every 1 content item, then de-duplication would bring to total storage down by 87.5%... all while maintaining better-than-filesystem performance, despite the added cost of encryption. See this whitepaper for some tuning statistics and tips.

Secure Files is the next generation of Large Objects for the database... and it's very cool... but what should you run on top of it? For the longest time, the folks at Stellent balked at using the database for file storage. Using the filesystem made much more sense because of performance reasons, which made up for the additional complexity of the architecture. However, if the user has 11g, there really is no better option than storing content items in the database.

NOTE: This rule-of-thumb does not apply for web content -- especially for small images and thumbnails. In those cases, a split approach where public web assets are stored locally would probably be faster. Luckily, a customized FileStoreProvider can help you achieve this.

Also, Oracle Universal Online Archive finally fits in with Oracle's broader strategy for content management. Even though it can store anything, the first release will have connectors to email servers to be a mail archive:

  • Microsoft Exchange
  • Lotus Notes
  • Generic SMTP Server

This fits right in with the Universal Records Management strategy, which is to embed a Records Management Agent in remote repositories, and control their life cycle from the Records Management system.

In other words, your email archiving policy is no longer dictated by IT. Your records managers can say when an item should be archived, and how long it should be retained based on events, instead of simply time and size constraints. For example, emails should be retained 2 years after a project completion, 6 months after employee termination, or 12 months after you lose a specific customer. That will reduce both your email space requirements, and your legal risk.

But it doesn't stop there... the next step is to make connectors to other content management systems, for example, Sharepoint. The idea is to archive content out of systems like Sharepoint, and replace them with a "stub". When a user downloads from Sharepoint, the "stub" is smart enough to redirect the download to the archive, and return it directly.

In other words, you could be using a secure, compressed, de-duplicated, encrypted, archive without ever noticing. Throw in a Records Management Agent, and you'll also invisibly comply with dozens of regulation and laws... no matter where you store your information.

Its a good strategy, and some interesting technology... we'll see how it pans out.

UPDATE: The release was announced, but they don't have a date for when it will be available for download. Here's some more info about the release, and some places to watch for downloads:

IOUG Collaborate 08: Wrap-Up

A usual, the last day of a conference ends on a half day... so I imbibed some Chimay Red with lunch. I was able to get a few others in the crew to follow suit. The usual suspects, indeed...

Michelle won the cookoff to see who had the coolest ECM implementation... woot! The prize was one "silver" ladle, and a $100 gift certificate. Besides Folios, annotations, and the new Site Studio contributor, she showed off Kyle's PicLens integration with Stellent's RSS Feeds, which went over quite well... nice and flashy! The roadmap and ECM focus groups were good as well... although in the future I'd do the cookoff first, then the roadmap, and lastly the focus group. That way, people have their feature lists and questions fresh in their mind.

As usual, a conference this large left me feeling like I missed out on a lot. I networked with a lot of people, and discussed ECM a lot... but I wanted to learn more about identity management, performance tuning, and Hyperion. There were simply too many options, and the handful of non-ECM talks I attended were a tad too high-level for my taste. Maybe I'm too technical, but I don't feel like I learned that much.

Brian Dirking wanted some feedback, so I guess I'd make the following suggestions:

  • After people register (and pay) for Collaborate 09, give them access to the presentations from 08. Then we'd better be able to determine who is a good presenter, what topics are too technical, or which ones aren't technical enough.
  • Have some level of continuity between years... I've given the "50 ways to integrate with the content server" talk about 4 times, but its always a bit different, and people continue to be surprised at how flexible Stellent is.
  • Have some kind of easy trends analysis to help people find "what's hot" in their industry. Ideally this would be community based, to avoid sales pitches and promotions. For example, send out a survey to ask people what their industry is, and what topics they are interested in... perhaps even which technologies or presentations that they might find useful.

I'm used to more focused conferences, like the O'Reilly ones... so this many high-level presentations makes me sad. I personally would like a bit of community feedback to help everybody find which topics are most relevant to their background, goals, and needs.

Not an easy undertaking... but I'd wager a lot of conferences would appreciate something similar.

IOUG Collaborate 08: Day Three

I hung out at mostly Stellent sessions today. Vijay talked about the FileStoreProvider, Alan had a great presentation on metadata models, and Tom was up on a customer panel. One of the questions on the customer panel was about the strengths and weaknesses of the Stellent UCM product. There were happily few minuses, and everybody on the panel said that ease of use (deployment, management, customization) was the biggest plus.

Some folks from Oracle's "Beehive" project presented "the future of collaboration." The beehive project is the latest iteration of Oracle Collaboration server, which is an email/calendar/task application more akin to Lotus Notes, and quite different from Stellent's document-centric Collaboration Manager. I missed the show, but everybody I talked to about it said it was quite memorable... however, they left out if they meant that as a good thing...

Somebody noticed that IOUG had me down to give my "50 Ways" presentation twice... I was surprised, so I headed down early to scope the room out. Then I spotted the big sign that said it already took place and is therefore canceled. On the way back, I sung by the bookstore, and noticed that they seemed to be running low on my Stellent book. I later bumped into the 2 customers who bought the last 2 copies, so I autographed them.

Its a good feeling for my niche-technology book to sell out half way through the conference... ;-)

I also spent about 2 hours in one-on-one sessions with customers. They were all concerned about how they were supposed to get started with a coherent enterprise-wide content management policy... those interactions drove the point home that there's a real need for the book Andy and I are currently writing.

And oh yeah... it snowed. Tuesday was 83 degrees, and on Wednesday it snowed. Strange... I thought that kind of stuff only happened in Minnesota.

One more half-day, then back home!

IOUG Collaborate 08: Day Two

I gave my presentation on 50 Ways to Integrate with Oracle Content Management today... it was similar to my one from Crescendo last year, but I updated it a bit with some of Oracle's new connectors (BI Publisher, Secure Enterprise Search, Records Management Agents, etc.).

After that, I had a book signing. On my way over, I realized that I didn't tell anybody I was doing a book signing.... so attendance was kind of thin. Plus I was late. Chaffee showed up with Patrick and Rhonda, and I signed his book with something characteristically glib...

I had lunch with some customers -- finally attempting that business networking thing -- and promised to help a few folks out with their architecture.

In the afternoon, I helped out on Michelle's two hour hands-on lab about Site Studio: Building an Enterprise Web Site From Scratch. Believe it or not, if you know what you're doing, you can get a pretty good handle on an enterprise scalable web site in a few hours with Site Studio... Then it was dinner with some Stellent folks, and drinks while we watched the Wild lose.

Since I'm now done with my official obligations, I'll be spending day three going to sessions and networking...

Recent comments