Articles specific to Oracle software products, including the former Stellent product line

Stellent vs. Documentum, Managing Users

James McGovern recently posted a complaint about how difficult it is to use Documentum with externalized user repositories (Stellent 1, Documentum 0). This time, one of James's minions couldn't find a Documentum API for adding users!

I could have sworn that Documentum had something like that... it appears that the DFC supports adding one user to one group at a time, as noted by Pie Guy, but the API won't let you add users to multiple groups in one shot... you need multiple API calls to do that.

My advice? If you're using Documentum a lot, get to learn Jython or JRuby. Documentum has a Java API, which is accessible from Jython (Java-based-Python) or JRuby (Java-based-Ruby). Then your administrators can whip out quick Python scripts for user and data management... which is sure faster than whipping out a full-blown-gonzo-whopper J2EE application every time...

To reiterate, stuff like this isn't really needed in Oracle/Stellent... they separate users from content, and encourage people to do all user management in LDAP or Active Directory. It works great.

If you chose to not use a separate user repository, then you can use the one built-in. If you want to script user creation, no problem. Use the ADD_USER service with any number of SOA interfaces (.NET, Java, Ruby, Python, etc.), or the command line IdcCommand tool.

James keeps a pretty good eye on how content management could better fit into the enterprise... I wonder what he would think of my theory that Enterprise 2.0 means the end of slow business process?

We can always hope...

Join Me at AIIM Control: Jan 14th in Minneapolis

If you're in the Minneapolis area, and you are interested in content management, you'll probably want to attend the AIIM Control event on January 14th, 2008. As an added bonus, I'll be giving the keynote presentation on Enterprise Mashups... I promised to not even utter the word "Oracle," so this will be a vendor-neutral presentation.

The cost is $100 for content management professionals that belong to AIIM... otherwise its $125. You can register here:

http://aiimmncontrol2008.eventbrite.com/

The location is the Four Points Sheraton 1330 Industrial Boulevard Minneapolis, MN 55413 (google map). Registration at 7:30 AM, the show starts at 8:30 AM, my keynote is over lunch, and the event ends at 3:30 PM.

How To Find Stellent Content On Oracle Metalink

If you're having a hard time locating Stellent patches / components / samples on Oracle's Metalink site, you're not alone... Thankfully, some former Stellent folks put together a Metalink FAQ for Stellent customers. This will help folks who are used to the browsable Stellent site become adjusted with the search query-heavy Metalink site.

Who would have guessed that a database company like Oracle would have query-heavy web pages? ;-)

Anyway, here's a quick list of questions, with links to their answers. You'll need to supply your Metalink password to view the FAQ answers in their entirety:

One of the biggest complaints I hear is the inability to find the samples. At present, Oracle is unsure what to do with these unsupported tutorials, and does not offer them on Metalink... until they post them elsewhere, I've made available some of the most popular ones up on Bezzotech's Library Page, including the Blogs, Wikis, RSS Feeds, and the How To Components.

I'm also trying to keep a list of Oracle Content Management patches up to date on my blog...

Another Reason Why Oracle Is "Open"

Right before Open World 2007, the Oracle Apps Lab team put together a clever skunk works project, thanks to the advocacy of some senior Oracle execs:

http://mix.oracle.com

Its gotten quite a bit of press recently, because its the only public Ruby On Rails application that runs inside Java. It uses JRuby, running in Oracle's application server, and was launched in less than 6 weeks... the app labs folks were quite pleased with Nutter's little baby.

So what's it do? Well, its kind of like LinkedIn, but the purpose is to have an immediate feedback loop between developers and Oracle customers. Customers can submit ideas, or make requests, and vote on which changes they prefer. They can tag their idea, and comment on other ideas. Several developers I know -- including several senior ones -- monitor the top rated feature requests.

Its still a bit preliminary, it could use better RSS and tag support... plus, who knows if it can scale to satisfy the needs of the millions of Oracle customers. But I'm damn curious to see if it can! It might quell the frious debate about why Rails is rare in the enterprise.

[Rails] has its own particular strengths. People who don't value those strengths aren't going to get it. It's like trying to explain to a vegetarian why Kobe beef is so good.

That might make James McGovern a bit happier...

Oracle Open World: Final Thoughts, And Some Cool News

The last day of Open World was a bit like the others... I attended one more session, and had a few conversations with other techies. I saw the Stellent folks do a cook-off in the No Slide Zone. With 1500 sessions, Open World can get a little crazy... frequently there were five sessions that sounded exactly the same! Of course, they weren't the same... which is a shame. Open World would benefit from five repeats of the best session, instead of five similar sessions.

As with most conferences, I learned more through one-on-one interactions than by going to sessions.

James had an interesting jab... he hoped that I would figure out why Oracle had the word "Open" in their conference title. He he he... yeah, Oracle doesn't get as much credit as other folks -- like IBM -- when it comes to the Open Source movement. Oracle has been a member of the Eclipse foundation for quite a while, they've open-sourced pieces of their ADF framework, as well as TopLink: their Java database persistence layer. They also have a few Linux kernel developers on staff to support Oracle Enterprise Linux, as well as other pieces of the GNU/Linux project. I can't say if they do as much as IBM or Novell, but they do their fair share...

One thing that surprised me was the sheer number of applications Oracle sells. They has been acquiring so many companies so fast, many are wondering how they could get them to all work together... its a challenge even for an all-Oracle shop.

Let's say your big company owns and loves a particular Oracle Customer Relationship Management (CRM) system... Suddenly you acquire another company, who uses a different Oracle CRM system. No problem, just force the smaller company to migrate their data... This costs millions, but what choice do you have?

Now suddenly you want a partnership with a third company, which requires you to share customer data. Newer Identity Standards can ensure a secure sharing of data... but how should you do it? To make matters worse, the 3rd company owns a different CRM system. It ain't Oracle, and there's no way they're changing for you. Consolidation isn't an option, and no standard exists that would satisfy your needs...

What to do?

I feel that people may be trying to solve this problem in the wrong way... Consolidation and standards are merely tactics: the strategic goal is interoperability. This applies for CRM, Enterprise Resource Planning (ERP), as well as Enterprise Content Management (ECM). Of course, that begs the question: how on earth do we achieve effective interoperability? SOA, BPEL, and ESBs seem to be Oracle's current plan... but other problems need to be addressed as well.

Hopefully my fans will be pleased to hear that Andy MacMillan and I will be working hard on the details of this problem... at least in the ECM realm. McGraw Hill has accepted our book proposal about a unified approach to information management! The title is still a work-in-progress.

With a bit of luck, by this time next year I'll be a two-time author, and you'll have the answers you need about unified ECM ;-)

Open World, Day 4

Today I finally made it to a session... this one was on SOA and Oracle's Enterprise Service Bus. I have several concerns with ESBs, and it was nice to get some straight talk about the pitfalls. BPEL isn't the best language to tie together business processes -- as Lonnke Dikmans says, it's tricky to deal with external events.

Plus, BPEL is a declarative XML language. I like declarative languages for simple things... but it gets pretty complex for large processes. A scripting language is almost always superior for orchestrating complex tasks -- compare Ruby Rake to Apache ANT for example -- but a declarative language is easier to modify with GUI tools. A non-programmer can configure a complex application if its in a declarative language... but they do seriously bind a developer's hands.

A scripting language with annotations would be the best of both worlds... the developer exposes what is "tweakable" with code annotations... and a non-programmer can modify the annotations. A workflow designer's primary use is to spit out boilerplate code... similar to how Scaffolding works with Ruby on Rails. The boilerplate jump-starts the development, and exposes standard annotations... Done and done.

Anyway, the next generation of Oracle's ESB will be based more on Coherence... which means better support for events, and it will no longer be one single logical bus: it will be cloud-based. That means more flexible, more robust, and faster... in theory. I'm anxious to play with it next year.

Next was on to the keynote by Michael Dell. I'm really impressed with the green initiatives at this conference. Yesterday Intel was freaking out about power consumption, and today Michael Dell was presenting his latest line of low-power computers. They also started a world-wide free computer recycling program. These data center guys are really freaking out about power consumption. at present, about half of the Fortune 500 spend more to power their servers, then they do on computers! Power centers consumer 1.5% of the power in the US... and its going to get a whole lot worse.

Let me remind you: data storage is growing faster than Moore's Law. Many computer folks have to think 10 years ahead... so they know that we're going to be in trouble soon. I think computer companies will soon be a huge force for finding cheap, clean, energy alternatives... otherwise, ten years from now, their business model of bigger-better-faster-more will be in jeopardy.

Between Dell's and Larry's keynote, there was an interesting "crowd sourced marketing" stunt... They showed a big board to the crowd, asking what your thoughs were on the values of integrated IT. You could send a text message to Dell, and your message would be displayed on the big board. The messages varied, from "better ROI" and "job security," to "integrated cowbell."

Later, it was off to the customer appreciation night, with 30,000 other folks. The highlight was seeing Lenny Kravitz in concert, who played a great set... He joked, "Man, you people throw some big parties. Y'all must be doing well!"

heh... Larry certainly is ;-)

Open World, Day 3

Missed all the sessions again today. I went to a publisher's workshop, which was pretty good. A bunch of project managers showed us what was super cool in Oracle's latest products, trying to tempt us into writing a book on it. I probably learned more there than in the average session anyway...

In the Keynote, the CEO of Intel showed off their latest chips. They are extremely low power, I believe it has a 70% increase in performance per watt compared to the previous generation. They also introduced the first lead-free CPU. The demos of the chips went OK... the power was low, but one of the computers crashed. However, it turned out to be a really good demo of the failover power of Oracle Coherence... which I'll talk about later.

Then Thomas Kurian showed off some nifty Fusion products, including the next-generation of web content management: Open WCM. Wicked cool... it solves so many problems. Its been in the works for almost 3 years, and they will finally be releasing it next year some time. I'm trying to get a beta version...

I hung out in the demo-pods most of the day again, checking out how the Fusion Middleware apps could possibly work together... Service Oriented Architectures are good, but they can't do everything. To do everything, you need an overlay of an event-driven architecture over a service-oriented architecture... ideally all tied together with a Turing-complete scripting language. But that means you can't "orchestrate" apps by drawing boxes and lines. Plus, you can't have too many events, otherwise your code turns into a big giant mess.

Still looking for the answer...

Open World, Day 2

I didn't get a chance to go to any sessions today... I had a meeting with a publisher about a potential second book -- cross your fingers! I'll know this week if they think it has a market. Afterwards, I spent most of my time at the demo pods playing with different kinds of Oracle software. Then it was a quick drink with some Content Management customers, then back to the demo lab to help set up computers for presentations tomorrow.

Seriously... its 11:30 pm, and I'm blogging from the hands-on lab. Mondays suck.

The number of Oracle products is a bit overwhelming... especially in the applications stack. There's a ton of overlap in the JD Edwards, Peoplesoft, and Siebel product lines... not to mention the dozen different ways to implement any project with Fusion Middleware. If your goal is to always use the most innovative solution, you're in for quite a long selection process... and by the time you're done, somebody will have released something better!

Software innovation happens so fast your best option is to satisfice... you're better off with a good relationship with a people in mediocre product line, than a mediocre relationship with people in a good product line.

Tomorrow I have a half-day meeting, then I'm speaking twice. I hope to catch at least one session, but I might get another goose egg like today :-(

Before closing... I feel that I should mention that a man was murdered outside Oracle Open World last night... shot five times outside the movie theater between two of the Open World buildings. I have no glib comment here... I just find it totally bizarre that this would happen, despite dozens of police within sight in every direction. I'm not frightened... just sad and confused...

Open World, Day 1

Sunday... registration, and a long meeting with the other Oracle ACE Directors in the community. I probably don't make this point very clear: I do not work for Oracle. My ACE title is honorary, because of my work in the developer community. Anyway, it was nice to finally meet Lonnke Dikmans, Jason Jones, Frans Thamura, and other blogless folk...

Anyway, Oracle was kind enough to give us the inside scoop on a lot of new technology features. I'm still unclear about what I'm allowed to talk about, so I'm gonna play it safe and wait for official announcements before I open my big yap. Overall, I'm really impressed with Coherence, cautiously optimistic about Web Center Suite, more realistic about BPEL Process Manager, and am slowly being won over by SOA Suite.

I'd also recommend that all Oracle atendees check out the Oracle Mix web site. This site was thrown together at the last second, but is a pretty cool social app for Open World conference goers...

Off To Oracle Open World!

Well, I'm hitting the road a bit early for the Oracle Open World conference... I'm a presenter for the following three sessions:

  • S291738: 50 Ways to Integrate with Oracle Universal Content Management, Tuesday 3:15-4:15
  • S292624: Hands-on Lab: Building an Enterprise Web Site from Scratch, Monday 12:30-1:30 and Tuesday 4:45-5:45
  • S292625: Hands-on Lab: Experience Oracle Universal Content Management, Wednesday 9:45-10:45

The 50 Ways talk will be similar to the Intro to Integration talk from Crescendo this year... but I have some different examples... For this talk I integrated Ruby On Rails with the Content Server, with a clever application of JRuby, Content Integration Suite (CIS), and the obligatory voodoo magic. If you're curious, be sure to attend ;-)

For the hands-on labs, I'm only going to show up if the speaker needs extra help... so no promises!

Anyway, there are over 1500 sessions this year, so its a little hard to choose which event to attend... plus so many of them sounds exactly the same. No big deal... I usually learn more by networking with fellow geeks then by reading powerpoint slides...

Personally, I'm curious about where they are going with Identity Management, Business Intelligence, and SOA Suite. I'm going to sit on a few of those talks, and pepper the presenters with trick questions. I'm mildly curious about Web Center Suite, BPEL, and Secure Enterprise Search... but I've already seen a lot about that.

Its a little tough to schedule my day. Usually conferences send you a brochure with all tracks neatly color-coded, and logically grouped by time slot. However, Open World 2007 is trying to be "paperless," and so they launched an online Open World schedule builder. Unfortunately, its usability is significantly more clunky than a dead-tree...

Oh well, there's always glitches when you do something the first time. I'm pretty sure next year's scheduler will be much better. Otherwise, I'm might have to make a Greasemonkey script...

See you there!

Great InfoWorld Article on Content Server 10gr3

InfoWorld put together a very complimentary article about the latest version of Oracle's Enterprise Content Magament suite: Oracle Universal Content Management Lives Up to the Name.

It covers a bunch of new features that the Oracle folks crammed into this version... improved SharePoint integration, better records management, the Site Manager for web content management, exposing the content refinery as a service-oriented architecture, and lots of Web 2.0 bells and whistles.

The review went in-depth decently, but a lot of the truly cool features in 10gr3 are difficult to explain. File Store Provider, some new Schema goodies, better support for super high volume sites, better architecture for developing components... those take a while to appreciate.

Makes ya proud... even tho I don't work there anymore ;-)

Patches For Oracle Universal Content Management (UCM)

I've heard many a complaint from Stellent customers about Oracle's MetaLink site... Its a lot clunkier than Stellent's old support site, and pretty hard to find a specific patch, sample, or extra. Especially considering the fact that the product names appear to be in flux, so often you don't know what to look for... ECM? UCM? WCM? URM? WTF?

Anyway, here's a list of links I've collected for finding the good stuff straight from updates.oracle.com:

Please note, you need to use your MetaLink password here... which means only Oracle customers and partners are allowed in!

Also... as of Nov 4th, there are no updates for anything besides UCM on updates.oracle.com... those last 3 links have no results. Hopefully this is where they will be placed, otherwise I'll need to tweak those URLs in the future. For now, they are a good starting point for finding the patches you need.

Content Folios: Released At Last!

Yes... the legends are true... Oracle finally released the long awaited product: Content Folios. For those who don't know, its like Folders on steroids. Essentially, its a way to group content items, folder, forms, images, web pages, anything into a group called a "folio."

Of course, you've always been able to link items together through metadata and tags... but now it's been formalized... and this bad boy has a very flashy interface.

It's not a separate product, nor its own download... it got snuck into patch 6602355... which is the Content Server 10.1.3.3.1 Update Bundle. Its only available via Oracle MetaLink, so its only available to customers and partners.

I'll be showing off more of its features next week... as will Billy Cripe, I'm sure.

UPDATE: as mentioned in the comments below, you should probably use patch 6907073, which is the Content Server update for 10.1.3.3.2.

Information Revolution

An "oldie" but goodie: Information Revolution, on how old assumptions about organizing physical data needs to be rethought, in favor of what is possible in the digital world:

http://www.youtube.com/watch?v=-4CV05HyAbM

I'm not 100% in agreement... sure, multidimensional "tagonomies" are superior to rigid "taxonomies," but that's been known for ages. It's certainly not unique to the digital world. Ask any microbiologist if they are happy with the scientific classification of life forms into kingdom, phylum, class, order, family, genus, species. How is a microbiologist supposed to use such a rigid system? Some of the life forms they discover appear to be both a plant and an animal.

People have preferred flexible "tagonomies" to rigid taxonomies for decades... even these new-fangled "hyperlinks" are nothing new... unless everybody has forgotten the ancient art of cross-referencing and footnotes.

No... The digital revolution has added nothing magical or mystical when it comes to content management... It has helped findability, but a side effect has been more useless data to sift through... The primary benefit of the revolution is that it enables everyone to be "experts!" The key to propelling the digital revolution is in engaging more experts, and improving as many dialogs as possible.

The problem of "content management" is relatively the same; now begins the problem of "expert management"...

But that's merely my expert opinion ;-)

Search My Content Management Book With Google

A little known fact... you can finally use Google Book Search in order to search through my book on Stellent/Oracle Enterprise Content Management. You can restrict your search by starting from the Google page for my book, or just use the form below:


Content Server Book Search:

I placed a copy of this form on the official page for the Stellent Content Server book. The search results aren't as good as having a PDF version of the book (available from APress Publishers), but its ten times better than the index supplied in the "dead-tree" version...

Enjoy!

Angel Food Cake

Apparently, the JDeveloper crew over at Oracle thought this startup tip was relevant:

An angel food cake will slice neatly without crumbling if you freeze it first, then thaw it.

Cute... that's exactly what I want from a Java IDE... Apparently another tip was this:

Aluminum foil pierced by a fork on a bed of ball bearings is not microwave safe.

hoooookay... I need to close my eyes and reboot before my laptop transforms into Loki.

Five Questions From James McGovern

James posted five Enterprise Content Management (ECM) questions on his blog... hoping that either John Newton or myself would reply. This time he takes the interesting approach of calling me smart instead of stupid...;-)

So let's get started...

Why do some ECM systems have a tight coupling with user data? Stellent/Oracle does not do this, but other do. I can't say for sure, but I'd bet access control list (ACL) performance is a major issue. IMHO, compared to other ways of adding security, ACLs do not scale. That makes stuff like project-level collaboration software very tricky to do right. Making it flexible, secure, fast, and enterprise scalable is nearly impossible.

If you have N employees, you have about 2N possible ACLs... 10 employess mean 1024 possible ACLs, but 20 employees means over a million! For 1000 employees, you wouldn't be able to enumerate all possible permutations on any currently existing digital storage device...

Do you really want to allow average users to design project spaces with ACLs on the fly? For workgoup-level servers that's fine, but enterprise-wide??? Holy crap, no! You'll probably need an in-memory cache of lots of user data, and some form of pre-compiled ACL expression-matching objects, not to mention the problems with latency... this means a tight-coupling between users and content is the logical sacrifice to make.

Decoupling is probably not a huge issue, but performance will suffer like the dickens...

SOAP versus REST? These days, I say nuts to them both. I'm more in favor of building service-oriented architectures with JSON web services. Ditch the entire SOAP stack, and go back to basics. However, not so basic as REST, and not so idiotic as XML-RPC. Mark Masterson suggested Atom and Atom Publishing Protocol... I think that those are great options for content delivery, but its too narrow of a pipe for content management. I don't want to tunnel context sensitive metadata display, or business process management, or several layers of customizations through AtomPub.

If I ever got behind an ECM standard, it would have to be a JSON-based web service. Simple, open, cross-platform, extensible, flexible, discoverable, mashup-ready. I would accept nothing less.

And speaking of standards, the next two questions were kind of the same... Why don't people talk about interoperability at AIIM conferences? That's a good question... as is the question why do ECM people seem ambivalent about standards?

Pie guy has weighed in on this several times... as has Billy Cripe, and... um... I don't know. I can tell you why people rarely talk to me about standards: I have a seething hatred for 90% of software standards and I'm unfortunately gifted at intimidating people.

Don't get me wrong, I love the RFCs! Those built the internet. But ECMA, JSR, Oasis, and the W3C can collectively suck an egg for all I care. Those committees don't innovate: they merely bully and claim credit for those who do. They're accountants and lawyers, not creators. Gimme a well-documented API over clunky, overengineered, committee driven, ill-conceived "standards" anyday...

I have a longer anti-standards rant in the works... which coincidentally includes anti-rules engines rants, and anti-portal server rants... it was just too ungodly long to include here...

So, the last question: where the heck are the ECM patterns? Answer: in my noodle. I got 'em rattling around in my brain like a dried pea in an empty tuna can.

Maybe some day I'll let 'em out...

(apologies to Dave Barry)

Compliance Oriented Architectures

Both James Governor and James McGovern -- or is it the other way around? -- have been chatting about compliance-oriented architectures... or a way to add records management and retention management as a service to the enterprise as a whole. It should be an infrastructure component, and not require your records to be migrated to your monolithic records management system.

All I can say is, been there, done that, yawn.

I'd advise them both to check out Oracle Records Management Agents. Stellent's Records Management Team envisioned those three years ago, made them two years ago, and they are a big part of Oracle's Universal Records Management strategy.

You don't need to put your data into a records management repository to manage it like a record! You just need an "agent" that runs in your remote system -- email archiving server, file system, 3rd party CMS -- that "calls back" to the content server via SOA when specific events occur.

For example, Oracle's URM will block somebody from deleting an email from the archive if the retention policy won't allow it. Likewise, it will force a delete, if the retention policy enforces it.

If you don't want to move all your content into one single repository, fine! But you do need a single-point for defining retention policies... especially for large organizations with multiple email archiving systems.

Of course, innovative architectures are nothing new for the Stellent crew... we had SOA about eight years before there was a name for it. I guess we likewise had "COA" at least three years before anybody else knew it was important... and then there's the stuff I'm not allowed to talk about.

But, big snaps to the Stellent Records Management team... your architecture is finally being dubbed as the standard for others to follow. The names of the developers I'll protect so they don't get spammed, but they know who they are ;-)

Firevox: Bringing Accesibility to Web 2.0

Let's say I have a web site... should I add new features, or follow the law? Usually this isn't an either/or proposition... typically people don't have to make a choice between adding cool features to their products, and following the laws... but when it comes to web sites, it ain't so simple.

On the one side is shiny new technology: AJAX, Mashups, Adobe AIR, Microsoft Silverlight, etc. As new innovative buzzwords come about, customers demand them.

On the other side is the law: accessibility standards for the visually and physically handicapped. This is vastly more important than people understand... The web has done more than the wheelchair to empower the handicapped. Shopping, research, finding friends, being a part of a community, helping others, even building a home in Second Life. Thus, finding a site that you can't use is vastly worse than a building without a ramp... its more like a building with a disappearing ramp...

Unfortunately, a lot of these shiny new technologies break existing standards. Most of the standards about clear labels and navigation are easy to implement... but others aren't. Screen readers need you to refresh the web page in order to "trigger" the event that something has changed... however most of the new technology focuses on changing the page without a refresh. What to do?

Well, you all know how I feel about standards... screw standards! Focus on the goal -- empower the handicapped -- and innovate your way out of the problem.

Now... there has been plenty of work on finessing Web 2.0 technologies to barely follow the accessibility standards... AJAX Patters has several. I prefer the more direct approach. New Web 2.0 technologies are not accessible, because screen reader technology is almost a decade old. The laws are written to conform to ridiculously outdated software... but people will stick to it until somebody has a better idea.

My solution? Firevox! Its a free, open source screen reader extension to Firefox. It is a native plug-in, and thus anything running on the page -- including Flash -- can be configured to "trigger" a redraw event in the screen reader. It's lacking a coherent set of patterns and standards for its usage, but that just takes time.

After this, web 2.0 technologies will help the visually handicapped even more than average users! Most of us have never used a screen reader... so trust me on this one. Its unbelievably painful to have to refresh and reread a web page when only 10 words have changed. If done right, Web 2.0 could empower the blind to surf the web much faster than before.

If you're a big giant company with plenty of cash (ie, Oracle), concerned about web sites and accessibility (ie, Oracle), and no love of Microsoft (ie, everybody), then let your developers help out on the Firevox project. $100k, 6 months, problem solved.

Of course, this means that sites may need to say Best Viewed with Firevox on them... but I won't shed a tear.


Related Items:

How Many is Too Many Search Results?

I got a question from a friend who is helping a client manage a fairly large repository of content: about 10 million full-text items. Naturally, they're concerned about search performance. They don't think they'll even need to scan all 10 million items for an open-ended search... but they're interested in a worst-case scenario.

The client seems to be following most of the best-practices for this kind of system:

  • Do not allow open-ended searches for average users.
  • Use customized query pages that pre-fill metadata to pre-narrow the search.
  • Direct users to fill in additional metadata as needed
  • Allow open-ended searches only for special users (administrators, auditors, etc.)

But this got my friend thinking: how many search results is ideal? You get massive diminishing returns on search, even Google. Wikipedia is replacing Google for many kinds of information research...

The rule of thumb I used is that if my full-text query returned more than 200 results, then I didn't construct a proper query. I learned this after years of searching for academic research papers before the web existed... You need to use statistically improbable word combinations to tease good results out of the chaos. Don't search for computer game, search for evil clown anchor catapult.

No, "evil clown anchor catapult" its not a computer game... but it should be.

Naturally, this means you need to have a very good idea of what you're looking for in order to find it. Such advice is worthless for window shopping... Metadata and tags help tremendously for general browsing... but then again, you're trusting somebody to use the same keywords you would use. If taggers aren't professionals, or at least passionate about findability, then you'll miss out on a great deal.

Its not uncommon for researchers and auditors to perform an exhaustive search, get thousands of responses, and then read everything they find... but the rest of us get bored quickly and need help.

That help usually takes the form of good information architecture, which requires a pretty good understanding of your content and your audience. The vast majority of the content in a 10 million item repository is of little interested to the average user. Its of great interest to the content creator, and the 3 or 4 people who need that specific report, but for everybody else it's noise.

Without information architects, email becomes the most used search engine for your site!

If people can't find things by browsing, they will phone, message, mail, or otherwise annoy others to get what they need. If you think you can't justify an information architect, count up how much money your organization wastes emailing links to people. I bet you'll be surprised...

I usually advise people to purchase Information Architecture for the World Wide Web, which is a good introduction to this field. Its a very good grounding on the subject. After that, I'd suggest Don't Make Me Think and Ambient Findability.

But, even if you're well grounded in the theory, it's still worthwhile to hire a specialist.

Recent comments