Look Out Oracle: MapReduce Might Be Able To Do JOINs
June 4, 2008 - 12:15pm — bexApologies for the esoteric post, folks... but this is kind of important... Two folks from Yahoo, plus two folks from UCLA, have just released a paper on the ACM about a new kind of parallel algorithm: Map - Reduce - Merge.
If you don't know about MapReduce, its the algorithm that makes most of Google possible. Its a simple algorithm that allows you to break a complex problem into hundreds of smaller problems, use hundreds of computers to solve it, then stitch the complete solution back together. Google says its excellent for:
"distributed grep, distributed sort, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning, statistical machine translation..."
bla bla bla... but MapReduce can't do joins between relational data sets. In other words, its great for making a search engine, but woefully impractical for virtually every business application known to man... although some MapReduce-based databases are trying anyway (CouchDB, Hadoop, etc.)
UPDATE: Some Hadoop fans mentioned in the comments that MapReduce can do joins in the Map step or the Reduce step... but its highly restrictive on the Map step, and sometimes slow in the Reduce step... joins are possible, but sometimes impractical.
Well... this latest twist from the Yahoo folks fixed that: they claim MapReduceMerge now supports table JOINs. No proof as of yet, but there are a lot of folks staking their reputation on this... so its a fair bet. The Hadoop folks seem to be experimenting with MapReduceMerge... so if they spit out some new insanely fast benchmarks, my guess is that this is for real...
What does this mean for relational database like Oracle? Uncertain... but I did hear a juicy rumor about 15 months back: some guy from Yahoo sat down in a room with Oracle's math PHDs, and spent a day discussing an algorithm for super-fast multidimensional table joins... like sub-second performance on 14-table relational queries, with no upper limit. My sources told me the Oracle dudes were floored, and started making immediate plans to integrate some new stuff into their database. The Yahoo connection made me think this might be the MapReduceMerge concept...
Coincidence? Perhaps... but a juicy rumor nonetheless.
Drama On The Oracle Wiki
June 3, 2008 - 12:04pm — bex
Well, this is unfortunate... CMS watch is reporting a rumor about an Oracle Wiki incident. An Oracle partner named Sten Vesterli posted some less than positive feedback about WebCenter on the Oracle Wiki... was promptly flamed by an Oracle product manager, then had his postings removed:
I placed some of the description and the pro/con discussion from my upcoming paper comparing Oracle development tools on the Oracle Wiki. And just like when I posted something not unambiguously positive about Oracle WebCenter on the Wiki, I was immediately flamed by an Oracle product manager, and any trace of negativity edited out of one of my pages.
Oops... looks like a Web 2.0 malfunction.
Firstly, CMS Watch sees this as something of a cultural problem. Microsoft's wiki is locked down tight: all submissions need prior approval. How quaint, they downgraded Web 2.0! In contrast, Oracle is much more open... which means Oracle probably has a policy for dealing with criticism. They need to react to posts from the occasional partner who flames them with product criticism that might help Oracle competitors. However, since the BEA acquisition, Oracle now owns 4 portals products! I'd argue the only real competition to WebCenter is inside Oracle, which may explain why passions are so high... Still, it would be good if Oracle changed their wiki terms of use to mention that they have reasonable editorial control if posts give away competitive information.
Second, Sten Vesterli is an Oracle ACE Director, like me. That means we have multiple channels for criticism if we don't like the feature set of the product. We're expected to extend Oracle some level of professional courtesy when we give criticism. I occasionally point out the flaws in Oracle products, but I almost always offer a workaround, and I don't put them on places as high profile as the Oracle Wiki... Naturally, some folks at Oracle would feel Sten was being a tad rude...
But ultimately, a wiki is the wrong place for criticism. Criticism almost always contains judgment, which by definition violates the neutral point of view policy that is on all wikis -- even Wikipedia. As Justin Kestelyn says:
A wiki is not the place for opinion, because opinion does not invite editing, only response.
The wiki was probably the wrong forum for Sten. Want to rant about WebCenter? Then your text belongs on a blog. Oracle's policy should simply be that: criticism belongs on your blog, not on our wiki, or any wiki. Then they should monitor pages that are "hot topics," and delete anything that looks like a rant. Clean and simple.
Hopefully Oracle doesn't try to lock down access to the wiki because of this drama...
UPDATE: Justin got in touch with Sten to figure out what really happened, it didn't seem to involve WebCenter, and CMS Watch blew it all out of proportion... The wiki is thankfully back to business as usual.
High Stakes For EMC / Documentum
May 29, 2008 - 12:49pm — bexCMS Watch has some interesting reflections on EMC World and Documentum... apparently, EMC still has decent Enterprise Content Management products, but there's a real lack of enthusiasm in EMC about the whole thing:
Under the covers there remains some good technology and some good technologists, but there just doesn't seem to be the enthusiasm in the rest of EMC to really get behind it. One way of classifying these two groups is that they consist of the remnant Documentum products (built and acquired) over the years. We see many elements of the collaborative DM that Documentum majored on in the past in today's Knowledge Worker division, alongside the updated eRoom offering. In the Interactive Media group we see the old Bulldog DAM products given a fresh coat of paint. Both looked fine in the demo, but in talking to broader EMC sales staff, there was little interest or knowledge of these areas.
The CMS Watch article is also an interesting intro to Content Management and Archiving (CMA)... which seems to be the path that a lot of Enterprise Content Management vendors seem to be taking. Oracle's plan to achieve CMA is with a nice blend of Stellent and their Universal Online Archive... I'll go into more depth in my next book ;-)
As Billy noted with some statistics, archiving is a big deal for a complete ECM solution... It seems like some folks at Documentum "get it," but the jury is out whether the EMC folks will listen...
Empathy vs Sympathy
May 25, 2008 - 1:31pm — bex
A few weeks ago I gave a talk about Communication For Geeks at the Minneapolis MinneBar conference. I strongly believe that the majority of software failures are communication failures, and if geeks want to be a part of fun, successful projects, they had damn well better learn how to communicate... because most managers clearly can't.
It was a surprisingly popular talk: I had twice as many attendees as I had handouts...
Anyway, on Friday, I got an interesting call from one of the attendees, Kelly Coleman. He was excited to tell me about a situation where he used one of my tips to better communicate with one of his friends... Kelly took to heart one of the most important lessons geeks need to learn: use empathy before education! I was really happy to hear about it, so I though I'd repeat the lesson here in case others might benefit:
Empathy is not Sympathy!
A lot of people confuse empathy and sympathy... I do it myself a lot. Sympathy is feeling what somebody else feels through you. When you are being sympathetic, you're not really helping much, because you're making the situation about you... In contrast, empathy is feeling what somebody else feels through them. You keep the focus on them, until you're certain they've expressed themselves fully.
To illustrate, the following would be sympathy:
Bob: I just got fired...
Joe: Wow, that sucks... but don't worry, you'll be fine! I got fired a few years back,
and there's always work available for talented guys like us, right?
Joe genuinely thinks he is being helpful... Joe is not being helpful! Joe isn't listening to Bob at all. Joe is rambling on about his own past, and about his theories of the job market. He's trying to connect with Bob, but he's using sympathy. Sympathy is dangerous, because it leaves Joe open for this:
Bob: What the hell do you know, Joe? That was years ago! You didn't have
a house! You didn't have a wife and a kid to support! The job market was completely
different back then! You have no clue about my problems! Get the hell away from me!
Joe: ...I was only trying to help...
Bob is clearly in a lot of pain. He's afraid of a lot of things, and his good buddy Joe clearly isn't listening. So Bob lashes out, and wisely tells Joe to get the hell away from him. Then Joe gets defensive, and says something even stupider. With luck, they'll be friends again in a few weeks... but you never know.
In contrast, empathy almost always is better... it would look something like this:
Bob: I just got fired... Joe: Wow, that sucks... you must be feeling pretty scared right now, huh?
Ding! Ding! Ding! Ding! Give Joe a cookie!
See the difference? Joe didn't make it about himself... he kept his focus on Bob. He asked Bob how he was feeling, and after Bob answers, Joe should keep asking. He should let Bob vent about his situation: his wife, his kid, his house, the job market, whatever. Even if Joe knows a guy who might give Bob a job, Joe should shut the hell up until Bob's finished venting. This may only take five minutes, or it might take a whole hour. Either way, its an important part of the process. Bob will not listen to what Joe has to say, unless Bob feels Joe fully understands his situation.
Empathy before education. Always.
How does Joe know when Bob's finished venting? He'll hear something different in Bob's voice: hope. When Bob is open for suggestions, he'll say something like, "what do you think I should do?" or "have you ever been in this situation before?" Only after Joe hears this, is Bob ready to listen to new ideas, new possibilities, and new ways of fixing this problem. Only after Joe hears hope, or a direct request for help, is Bob ready to hear what Joe wants to say. If Joe wants to help Bob, Joe needs patience.
Now... empathy is not easy, and its extraordinarily difficult for engineers.
Most technical people have been brainwashed by years of "education" into believing that there's a "right way" to do everything, and that its our job to fix it. When something is "wrong," we want to dive in and tell everybody how to make it "right" again. Its a trained compulsion. This is why engineers make lousy lovers, but excellent terrorists. In both cases, its a lack of empathy that dooms us to this fantasy world of absolute right and wrong, making it impossible to see things from another perspective.
Sound like anybody you know?
As such, it will be difficult for software engineers to learn empathy... but they needs lots of practice before they can move on to even more advanced forms of communication... which I'll be talking about on a later date ;-)
Should We Always Use Empathy?
That's a tricky question... empathy takes a lot of time, and sometimes you don't have the luxury. However, it is important to understand what empathy is, so when people "fall off the wagon" you won't take it personally...
For example, blogger James McGovern decided to practice empathy which got some props from Billy... however, James' blogging style does not lend itself well to empathy... He's snarky, and enjoys to inciting fights, so he can better understand who has a better position. If everything were puppies and rainbows, I'd probably stop reading his blog. No surprise that James then went back to his old self after about 23 hours...
And lets not forget Broc Samson's mystic journey in Venture Brothers... where in a dream he learns the value of empathy and feels great... but then is confronted by his former special ops trainer:
Broc Samson: What about uhhh, humanity and empathy and all that garbage?
Hunter: You're a tool, boy, a tool! Built for a single purpose by the United States
who shut your third god damned eye for a good f$%&ing reason! You can't
teach a hammer to love nails, son. That dog won't hunt!
Yep... an army of empaths sure would be cool... but in the meantime, we live in a world of conflict... so until everybody understands the power of empathy, its probably best to know multiple ways to deal with conflict. In order, I prefer empathic communication, principled negotiation, then Broc Samson.
In the meantime... practice giving and getting empathy. Its far more powerful than you realize.
Free Enterprise 2.0 Training
May 21, 2008 - 2:49pm — bexAIIM is offering free Enterprise 2.0 training, at least for a short while. They have a ten-part course on Enterprise 2.0, and AIIM is offer one of the ten sessions for free...
A lot of folks are confused, and justly ask what the heck is Enterprise 2.0 anyway? Jake opined a while ago that the Enterprise 2.0 label might be a little unnecessary... because its all pretty much the Web 2.0 stuff anyway. I agree in part... a lot of the initial buzz about Enterprise 2.0 is pretty much just making blogs, wikis, and social software more "enterprisey."
However, its also about streamlining business process management, data mining, and data visualization with freaky new tools... and a lot of cool new security offerings. Personally, I think that solving Enterprise 2.0 security problems is easier than solving Web 2.0 security issues... because cross-domain single sign on is a lot easier to do in the enterprise... and there's less spam ;-)
I'd also like to emphasize that real Enterprise 2.0 shouldn't be focused on the latest buzzwords... it should be about empowerment, simplicity, and evolution. Bill Gates recently reminded us of the dirty little secret of software: tradition enterprise apps are more about tracking and monitoring employees, rather than empowering them to do their jobs better. Enterprise 2.0, if it takes the lead from Web 2.0, should break that command-and-control mold to enable bigger, better, faster innovation. Otherwise, it shouldn't even be called Enterprise 2.0.
To paraphrase Clay Shirky, innovations aren't socially interesting until they are technically boring. Frankly, I'd argue that enterprise applications could certainly benefit from some technical boredom... instead of re-architecting your solution every 5 years with the latest and greatest ivory tower buzzwords -- J2EE, EJB, Portals, SOA, CEP, ESB, etc. -- just use the simplest things that works.
To sum up... Keep it simple (stupid!), focus on usability, and your audience will love you... Or as Einstein would say:
Things should be made as simple as possible -- but no simpler.
Always good advice...
Inventor Of The Web Now Looking Into Metadata
May 15, 2008 - 8:18am — bexTim Berners-Lee, the inventor of the world wide web (shown right, Gore-ified), is complaining about how difficult it is to find what you want on the web. He brings up some points that ECM folks have known about for a long long time: metadata is essential for finding what you want. Timmay has been awarded a $350,000 grant to study the issue... like he needs the cash...
Anyway, I'm all over embedded metadata... but I can tell you right now its nearly impossible to get everybody in a small company to agree to a complex metadata standard, so good luck accomplishing that on the whole frigging internet... also, its even more impossible to get content creators to follow the metadata policies. People who work hard to create great content believe others should work hard to FIND their content.
Silly? Yes... but a very common attitude.
The ones who usually spend a lot of time optimizing their metadata and search results, are usually those with lower quality content, or people who are obsessed with their own popularity. Such as myself... Or, like spam blogs, aka splogs...
That being said, I wish Timmay well... but you don't need $350,000 to come up with a solution for this. I'd recommend creating a Microformat for embedded metadata, to apply to the content in the containing element... You'd also need some kind of cryptographically strong key for "trusted" metadata suppliers, so people don't try to tag porn sites with every metadata keyword in the universe. Also, the microformat needs to be extensible, so anybody can embed custom namespaces and keywords in it.
The Microformats folks have a tag similar to what he wants already, the rel-tag microformat. Its overly simple, but that simplicity will encourage adoption -- much in the same way that the "inferiority" of HTML over all other markup formats ensured its success. It looks like this:
<a href="http://technorati.com/tag/tech" rel="tag">tech</a>
If this link is anywhere on the page, it signifies that the page has something to do with the metadata tag tech, and it gives Technorati as the home for the tag. I'd expand on this to allow for other tag homes, such as del.icio.us, or even any arbitrary social bookmarking site.
Done and done.
Naturally, you'd want to be more precise... some new sites have dozens of articles on the main page, and you'd like to tag each section individually... so this tag should only apply to the content in the containing tag. Also, this kind of formatting is difficult to embed in a Microsoft Word document: I couldn't get behind any kind of distributed metadata model unless it easily worked in multiple formats. A CSS style might be a better choice than the rel attribute in the link...
Finally, it needs some kind of authentication model... perhaps you need an authentication microformat somewhere on the page, and all microformats on the page inherit the credentials. Of course, you'd need to have some kind of tag to force exceptions, so banner ads from remote servers would need their own auth tokens.
Whaddya say, Timmay? Is that due-dilligence worth $5k?
Where Are The Dang Stellent Announcements?
May 14, 2008 - 6:37pm — bexIf you're a Stellent customer, you've probably noticed that you no longer get those nice customer emails about ECM from Oracle... that's because Oracle -- wisely -- has a pretty strict anti-spam policy... so people who own one Oracle product aren't blasted with info and offers from other products.
So what should you do instead? Well, first of all you need to register for the quarterly content management newsletter. This isn't customer-only focused, but its got a lot of good stuff... and the archives are online.
In addition, Oracle is now is doing a quarterly customer webcast to keep folks up to date about the latest changes in the product line. The next one will be June 5, 2008 at 9:00 a.m Pacific Time. If you'd like to attend, you need to register with Intercall. Its for customers and partners only... so be sure to use your company email address, or you'll get booted!
Another option is to configure Metalink to send you ECM updates. Follow these steps:
- Log in to Metalink.
- Click the Headlines link in the upper left of the page.
- Click the Edit Page button under Headlines on the right.
- In the drop-down list, select Certify and Availability, and click Add New.
- In the top right, select the Certify Settings button.
- In the drop-down list under Product Alerts, select Oracle Universal Product Management, and click Add New.
- Click the Overall Settings button.
- At the bottom of the page, click the Automatically email My Headlines to me checkbox.
- Click Store Settings.
- Play the waiting game, or hungry hungry hippos until your emails arrive.
You can also subscribe to bug reports and knowledge base articles, if you'd really like to keep up-to-date...
Of course, to be fully up to date, you'll need to do all this, plus subscribe to email notifications from the Stellent Yahoo Group and the Oracle ECM Forum... but that's only for information hogs like me ;-)
BEA Participate Beats Everybody in Swag vs. Swag
May 13, 2008 - 12:46pm — bex
I'm out here at the BEA Participate conference... or more correctly, I'm up in the hotel writing while everybody else is attending the conference... I'm just tagging along so I can chat with Andy about the book, and get the low-down on what the BEA acquisition means for Oracle.
I asked Victoria Lira over at the Oracle ACE Director program if she could get me a free ticket to the event, but there were none available. Even Oracle employees had to pay their own way! Highly unusual, especially considering that Oracle now owns BEA.
I was surprised at first, until Michelle revealed the free iPod Touch that all attendees got! Damn... I haven't seen swag like that since the dot com days...
I guess now that they are Oracle, they suddenly have money to burn...
Does Art Have A Process?
May 6, 2008 - 11:17am — bex![]()
I recently came across the article We Don't Know How We Program. It was a discussion about the gaps between what developers and non-developers think about the process of writing code. It begins:
I was talking to a colleague from another part of the company a couple of weeks ago, and I mentioned the famous ten-to-one productivity variation between the best and worst programmers. He was surprised, so I sketched some graphs and added a few anecdotes. He then proposed a simple solution: "Obviously the programmers at the bottom end are using the wrong process, so send them on a course to teach them the right process." My immediate response, I freely admit, was to open and shut my mouth a couple of times while trying to think of response more diplomatic than "How could anyone be so dumb as to suggest that?"
hehehe... the central premise to the article is that programming is a creative endeavor, which doesn't lend itself well to process... The unfortunate developers subjected to process will only achieve mediocrity... additionally any process that stifles creativity will expunge or crush exceptional programmers, because they need creative space to be ten times as productive.
Does that mean that good programming cannot have a process? Of course not... although as others have noted, things like CMMi should be avoided like the plague. A process needs to be able to empower creativity, but also reign it in when necessary. Programmers -- like artists -- think big, and do wild things that are cool but don't satisfy the needs of the end users. The product doesn't sell, the users rebel, everything goes to hell... the developers know full well of the "failure," so to nurse their bruised egos, they blame the users for being dumb, or the specification for being incomplete. Then they curl up into a ball and call themselves misunderstood.
Yep. Just like Van Gogh.
To reign this in, you need a peer- and customer- driven process to help keep the project down-to-earth... however, done in such a way to not bruise egos or go anywhere near arbitrary rules. The process needs to evolve with the code. You also need something that encourages developers to think of the code as a community project, to reduce a sense of ownership, and thus keep egos intact. Agile focuses a lot on those kinds of processes... although Agile needs some tweaking for very large projects.
In addition, you also need processes that get the creative juices flowing... this doesn't mean brainstorming sessions or hyper expensive collaboration tools. This usually means simple things like physical proximity. Some teams even had great success with an enforced MESSY DESK policy. That's right... clean desks are evil! Messy desks and physical proximity encourage the "drop in, say hi, notice notes strewn about, and comment on them" process... which more than anything else inspires collaboration and fresh ideas.
My gut feeling? Unless you have artists designing your code process, your organization will never create exceptional code.
So keep a close eye on that process weenie with the stopwatch... he's clearly up to no good.
Oracle Finally Acquires BEA
May 1, 2008 - 1:04pm — bexIts even more official... the Oracle purchase of BEA is final.
Most of my thoughts on the subject are in an older post from when Oracle announced their initial offer for BEA.
Its effect on Oracle ECM technology will be minimal... Oracle ECM already integrates quite well with a large number of BEA products, and this doesn't alter the overall ECM strategy much. The Stellent alumni are pleased as punch... Although the price list for Oracle Middleware just got a lot more complex.
Speaking of which, the effects of the BEA purchase on Oracle ECM sales should be very positive... since Oracle sells the best content management app available, and it integrates nicely with lots of BEA goodies, it should be a pretty easy sell to existing BEA customers.
Of course, the devil is in the details... so stay tuned.
UPDATE: Billy Cripe has some info about potential layoffs in Oracle Fusion Middleware. I'd like to link directly to the specific article about layoffs... but when I click on the permalink, it just takes me to Billy's LinkedIn page! Bad Omen?
UPDATE 2: Billy fixed the link...
How Many Hits Does Your Site REALLY Get?
April 29, 2008 - 9:42am — bexIts been two years since my inaugural blog post on April 29th, 2006: The Trouble With RSS. Over my site's second year, I wanted to do some long-term analysis on how different web analytics tools track hits, visits, and the like. As expected, they don't agree with each other:
- SiteMeter: 89,800 visits (132,000 hits)
- Google Analytics: 84,000 visits (140,000 hits)
- Webalizer: 431,000 visits (3,660,000 hits)
Curious about why web site statistics differ based on the tool? SiteMeter uses an embedded image (at the bottom of this page), and tracks a hit every time somebody loads the image... so if you block banner ads, your visit might not be recorded. Google Analytics loads some JavaScript, which is useful for tracking more complete data... but if your browser blocks JavaScript (or cross-domain JavaScript), it wont register a hit. I found it odd that SiteMeter tracked more visits, but fewer hits than Google Analytics... curious.
In contrast with the other two, Webalizer uses raw Apache logs to determine hit count, so it tracks every single dang hit... Over 3 million hits in one year??? That's clearly too many... I'm not that interesting... but the visit count might be more accurate. Webalizer is the only analytics tool that tracks folks who view my site with RSS Readers, which may hit my site several times per day... thus the higher visit count. The hit count is hyper inflated because it counts search engine spiders, spammers, and hack attempts (some better than others).
All told, if the majority of folks view my site with RSS, then Webalizer's count is more accurate. If most of them view it the old fashioned way, then the other two are more accurate. I'm probably in the 100,000 - 200,000 visits per year range.
Unfortunately, none of these numbers include the folks who read my site through an online RSS readers, like Google Reader, or Bloglines. These sites hit my RSS feed once, then share it with dozens of folks who subscribe to the feed... To get a better estimate, I could pipe my RSS Feed through something like Feedburner. Feedburner keeps track of how many subscribers you have on the online feed readers, and produces decent stats on it... however, once you move your feed to Feedburner, its almost impossible to move it out... so I'm not happy with that option. Even so, that still wouldn't track those who view my content through RSS aggregators like Central Standard Tech, or Orana, or other sites that run Planet.
Well, what about the data from Alexa? That site ranks web pages based on those who surf the web with a toolbar that tracks their every move. Personally, I think people who surf with that toolbar are opening up a major security hole... so their viewing audience is probably restricted to folks who are kind of tech savvy, but don't take security precautions. In other words, newbie geeks. I've never broken into the top 100,000 sites ranked on Alexa, but I frequently break the top 100,000 sites ranked by Technorati... although Technorati only ranks blogs.
UPDATE: As Phil noted in the comments below, most people use Alexa just to boost their own page rank. For example, you could have your web team install and enable the Alexa toolbar, but only when browsing you own web page. That would make your Alexa rank huge without any actual hits from the greater internet...
Even if we could accurately count how many people hit the site, we're still at a loss to know who paid attention. Google Analytics tries to measure "time on the page", other metrics include bounce rate, or even the number of comments.
Oh well... A reliable measure of relevance will always be elusive... but at least we have enough estimates to support a cottage industry of people analyzing those metrics to prove anything they are told to prove ;-).
Back to my anniversary... Lots of stuff has changed since my first anniversary post: I've traveled to South Africa, Brazil, and Argentina... I've remodeled my kitchen, I've nearly completed my second book on Oracle enterprise content management, I've given technology presentations at Oracle Open World, AIIM Minnesota, BarCamp Minnesota, and IOUG Collaborate in Denver. I've trained both salespeople and consultants on what Enterprise Content Management actually is, and I helped negotiate a settlement to an 18-month lawsuit against a local non-profit. Oh yeah... I implemented about a dozen ECM solutions as well...
Next year, I hope to have even more goin' on... and a few more web site visits.
Oracle Universal Online Archive: The "Killer App" for Oracle Secure Files
April 23, 2008 - 1:39pm — bexWhen I first heard about Oracle taking a new direction with their old content management product -- meaning the old Content DB, not the newly acquired Stellent stuff -- the first thing I thought was it's about time!
When Oracle claimed it had 2 content management systems, that really confused people... especially considering that Content DB was at best a set of tools to create a content management system, whereas Stellent was a full blown application plus framework. They really weren't like each other at all.
Universal Online Archive (UOA) is Content DB, but now focused on being an archiving platform. On Oracle 11g, it is an extension on the Secure Files feature of the database. If you haven't heard of Secure Files yet, it beats the Linux filesystem on both read and write performance. It also has compression, de-duplication (only storing duplicate files once), and encryption. The encryption is an extension of Oracle Transparent Data Encryption, plus support for encrypting entire tablespaces instead of just individual columns. This means support for foreign keys, as well as indexes beyond the basic b-tree stuff...
Compression reduces the storage needs by 33% on average, according to Oracle. If you then use the statistics from IDC that there are 8 copies for every 1 content item, then de-duplication would bring to total storage down by 87.5%... all while maintaining better-than-filesystem performance, despite the added cost of encryption. See this whitepaper for some tuning statistics and tips.
Secure Files is the next generation of Large Objects for the database... and it's very cool... but what should you run on top of it? For the longest time, the folks at Stellent balked at using the database for file storage. Using the filesystem made much more sense because of performance reasons, which made up for the additional complexity of the architecture. However, if the user has 11g, there really is no better option than storing content items in the database.
NOTE: This rule-of-thumb does not apply for web content -- especially for small images and thumbnails. In those cases, a split approach where public web assets are stored locally would probably be faster. Luckily, a customized FileStoreProvider can help you achieve this.
Also, Oracle Universal Online Archive finally fits in with Oracle's broader strategy for content management. Even though it can store anything, the first release will have connectors to email servers to be a mail archive:
- Microsoft Exchange
- Lotus Notes
- Generic SMTP Server
This fits right in with the Universal Records Management strategy, which is to embed a Records Management Agent in remote repositories, and control their life cycle from the Records Management system.
In other words, your email archiving policy is no longer dictated by IT. Your records managers can say when an item should be archived, and how long it should be retained based on events, instead of simply time and size constraints. For example, emails should be retained 2 years after a project completion, 6 months after employee termination, or 12 months after you lose a specific customer. That will reduce both your email space requirements, and your legal risk.
But it doesn't stop there... the next step is to make connectors to other content management systems, for example, Sharepoint. The idea is to archive content out of systems like Sharepoint, and replace them with a "stub". When a user downloads from Sharepoint, the "stub" is smart enough to redirect the download to the archive, and return it directly.
In other words, you could be using a secure, compressed, de-duplicated, encrypted, archive without ever noticing. Throw in a Records Management Agent, and you'll also invisibly comply with dozens of regulation and laws... no matter where you store your information.
Its a good strategy, and some interesting technology... we'll see how it pans out.
UPDATE: The release was announced, but they don't have a date for when it will be available for download. Here's some more info about the release, and some places to watch for downloads:
IOUG Collaborate 08: Wrap-Up
April 17, 2008 - 4:55pm — bex
A usual, the last day of a conference ends on a half day... so I imbibed some Chimay Red with lunch. I was able to get a few others in the crew to follow suit. The usual suspects, indeed...
Michelle won the cookoff to see who had the coolest ECM implementation... woot! The prize was one "silver" ladle, and a $100 gift certificate. Besides Folios, annotations, and the new Site Studio contributor, she showed off Kyle's PicLens integration with Stellent's RSS Feeds, which went over quite well... nice and flashy! The roadmap and ECM focus groups were good as well... although in the future I'd do the cookoff first, then the roadmap, and lastly the focus group. That way, people have their feature lists and questions fresh in their mind.
As usual, a conference this large left me feeling like I missed out on a lot. I networked with a lot of people, and discussed ECM a lot... but I wanted to learn more about identity management, performance tuning, and Hyperion. There were simply too many options, and the handful of non-ECM talks I attended were a tad too high-level for my taste. Maybe I'm too technical, but I don't feel like I learned that much.
Brian Dirking wanted some feedback, so I guess I'd make the following suggestions:
- After people register (and pay) for Collaborate 09, give them access to the presentations from 08. Then we'd better be able to determine who is a good presenter, what topics are too technical, or which ones aren't technical enough.
- Have some level of continuity between years... I've given the "50 ways to integrate with the content server" talk about 4 times, but its always a bit different, and people continue to be surprised at how flexible Stellent is.
- Have some kind of easy trends analysis to help people find "what's hot" in their industry. Ideally this would be community based, to avoid sales pitches and promotions. For example, send out a survey to ask people what their industry is, and what topics they are interested in... perhaps even which technologies or presentations that they might find useful.
I'm used to more focused conferences, like the O'Reilly ones... so this many high-level presentations makes me sad. I personally would like a bit of community feedback to help everybody find which topics are most relevant to their background, goals, and needs.
Not an easy undertaking... but I'd wager a lot of conferences would appreciate something similar.
IOUG Collaborate 08: Day Three
April 17, 2008 - 8:23am — bexI hung out at mostly Stellent sessions today. Vijay talked about the FileStoreProvider, Alan had a great presentation on metadata models, and Tom was up on a customer panel. One of the questions on the customer panel was about the strengths and weaknesses of the Stellent UCM product. There were happily few minuses, and everybody on the panel said that ease of use (deployment, management, customization) was the biggest plus.
Some folks from Oracle's "Beehive" project presented "the future of collaboration." The beehive project is the latest iteration of Oracle Collaboration server, which is an email/calendar/task application more akin to Lotus Notes, and quite different from Stellent's document-centric Collaboration Manager. I missed the show, but everybody I talked to about it said it was quite memorable... however, they left out if they meant that as a good thing...
Somebody noticed that IOUG had me down to give my "50 Ways" presentation twice... I was surprised, so I headed down early to scope the room out. Then I spotted the big sign that said it already took place and is therefore canceled. On the way back, I sung by the bookstore, and noticed that they seemed to be running low on my Stellent book. I later bumped into the 2 customers who bought the last 2 copies, so I autographed them.
Its a good feeling for my niche-technology book to sell out half way through the conference... ;-)
I also spent about 2 hours in one-on-one sessions with customers. They were all concerned about how they were supposed to get started with a coherent enterprise-wide content management policy... those interactions drove the point home that there's a real need for the book Andy and I are currently writing.
And oh yeah... it snowed. Tuesday was 83 degrees, and on Wednesday it snowed. Strange... I thought that kind of stuff only happened in Minnesota.
One more half-day, then back home!
IOUG Collaborate 08: Day Two
April 16, 2008 - 10:30am — bexI gave my presentation on 50 Ways to Integrate with Oracle Content Management today... it was similar to my one from Crescendo last year, but I updated it a bit with some of Oracle's new connectors (BI Publisher, Secure Enterprise Search, Records Management Agents, etc.).
After that, I had a book signing. On my way over, I realized that I didn't tell anybody I was doing a book signing.... so attendance was kind of thin. Plus I was late. Chaffee showed up with Patrick and Rhonda, and I signed his book with something characteristically glib...
I had lunch with some customers -- finally attempting that business networking thing -- and promised to help a few folks out with their architecture.
In the afternoon, I helped out on Michelle's two hour hands-on lab about Site Studio: Building an Enterprise Web Site From Scratch. Believe it or not, if you know what you're doing, you can get a pretty good handle on an enterprise scalable web site in a few hours with Site Studio... Then it was dinner with some Stellent folks, and drinks while we watched the Wild lose.
Since I'm now done with my official obligations, I'll be spending day three going to sessions and networking...
IOUG Collaborate 08: Day One
April 14, 2008 - 10:38am — bexI suppose I should start with day zero, and not day one...
Michelle and I landed, but the hotel didn't have our reservations on file. Great... and on the one day we decided to not print out the confirmation letter. Michelle scoured her web-email using the computers behind the reservation desk... in the meantime a few Oracle employees came in and were initially confused as to why she was working behind the counter... Anyway, the clerk looked through their list of who was checking in that day, just to see if our names were spelled incorrectly.
We were there of course: as Brian and Michelle Hugg. Lovely. Yeah. We'll live that down.
Later I had drinks with some folks I hadn't seen in a while (like Dan Norris and Matt Topper), as well as folks I heard of but never met (like Jake Kuramoto and Paul Pedrazzi). The Oracle ACE Director dinner was good. I love finding out what other ACEs are up to, and what technologies they are interested in. The buzz these days seems to be all about Hyperion... just when I started learning about BI Publisher and Real-Time-Decisions!
Keeping up on enterprise technology is a constant struggle...
The first day of IOUG Collaborate 2008 was pretty good... I hung out at the Enterprise Content Management conference-withing-a-conference a lot to chat with other ECM folks. I gave a well-recieved talk about why ECM projects fail, which was essentially an extension of the AIIM list from last year. It wasn't just a rant, it had some practical advice of what typically goes wrong, and what you can do about it. Cliff Cate and Tom Tonkin presented their war stories and advice as well.
Here's a tip: very few enterprise software failures have much to do with bad software... its almost always poor communication.
I wasn't able to attend many sessions after that... not the exhibit hall, not even the keynotes. I did check out the hands-on lab about Oracle Text, hoping for a deep dive... but it was pretty basic. Attending a conference is more fun when you're not a presenter. I had to go to my hotel early to put the finishing touches on my Tuesday presentation... so I skipped all the festivities.
I have another session on day 2, after which I'll be able to relax, attend more sessions, and network more.
Oracle Unbreakable ECM?
April 9, 2008 - 2:52pm — bex
After my security posts last week (here, here, and here), I got an interesting email from an Oracle partner out west (David Roe from Ironworks)... one of his customers put Stellent though a battery of automated security tests, and got some surprising results:
Incidentally one of our clients ran through a couple rounds of automated security testing on their UCM instance. They sort of surprised us with it actually, but when they were done sent back some great feedback about how strong the system was and how it passed every check (apparently an uncommon occurrence). I personally don't put a lot of faith in any automated testing, but it's nice to know Stellent will pass one :)
Like the author, I don't put that much faith in automated tests... but many of these security testing companies are batting 1000: some of these firms brag that they always find security holes, but this time they came up empty. Even on an unannounced, surprise, security audit.
Naturally, neither David no myself will reveal the name of the customer... because bragging about an unbreakable system is the surest way to attract the wrong attention... but if a legitimate analyst or existing Oracle customer would like to chat with these folks, Dave could facilitate a connection.
What Should ECM Apps Do About Security?
April 2, 2008 - 10:39am — bexJames responds... to my latest security rant, with a lot of good points. I think this point here is the best:
Have you ever noodled that as data flows from one system to another within an SOA, but the security model doesn't, that this is another attack vector? For example, what if I have access to data in a policy administration system such that I can figure out if you are insuring an auto that your wife doesn't know about but couldn't do the same in a claims administration system? I bet you can envision scenarios when you integrate a BPM engine with an ECM engine that security becomes weaker.
Absolutely... unfortunately, this is an amazingly difficult problem. Its not really the realm of ECM or BPM to solve it... rather, the best thing that we can do is not get in the way. Let the experts solve that one, and then integrate as well as possible with global policy management systems.
My suggestion is this:
- Implement a policy-based security model in your application (ECM/BPM).
- Loosely couple your application with an identity management system, so you can access a global security policy.
- Place extra hooks -- and allow people to "inject" new hooks -- that allow additional security callbacks to arbitrarily re-validate both credentials and access rights.
- Optionally, map the global policy and list of access rights to a policy more relevant to your application, and allow access by "local" users not in the global repository.
Most applications in the Oracle ECM stack follow this methodology... but I can't vouch for all Oracle applications. I like it, because its flexible enough to 'slave' yourself to an identity management system, and yet still have some local control over access rights if you want to 'boost' somebody's credentials.
I think it would be great if Oracle chose to augment this model to add support for a policy auditing standard... but I have no idea if anybody is asking for one, and if so, which one? I'm positive James has an opinion... I'm a fan of just using Business Intelligence to do the reporting, since (again) you can "sneak-in" better security along with the latest buzzword ;-)
Sub-optimal? Of course... but anything that makes security look less like a cost-center is good...
I also like the concept of Oracle's magic black box for identity services. That would make it easier for developers to create policy-based security models, that (in theory) would work with old, new, and emerging standards alike (XACML, CardSpace, OpenID, etc.). It's not that I don't like XACML, its simply that there are other horses in this race... and developers do not have the power to dictate architecture. We can suggest what works best, but in the end, the most sellable product will support them all.
I fully agree that #4 is a possible attack vector, which is why good access auditing and rights auditing tools are important... However, users frequently insist on local control of security rights, because there are many legitimate business cases where it isn't feasible to place all users in a global repository with the proper rights. Sometimes -- especially during mergers and acquisitions -- you want to keep the identities and access rights of these folks as secret as possible. Or, if your IT department has a 3-week waiting period for new users, but you need a contractor NOW for a 2 week project, guess what will happen?
I especially like how Oracle ECM implements #3... some of the more interesting aspects of the future of security involve multiple challenges for access. For example, assume a user has access to both mundane and highly restricted content, but her daily work is usually with the mundane. Now, at 7pm, she's suddenly accessing a ton of highly restricted content. Red flag! Even if her security tokens have not yet expired, a good security system would notice that this behavior is strange, and demand further authentication credentials... maybe the name of her first pet, or the manual-override PIN.
Anyway, Oracle ECM doesn't do any integrations like that as of yet, but it has the flexibility to do it... several identity management systems support that approach, and ECM is being positioned more and more as "infrastructure..." so I'd wager its only a matter of time.
Want Secure Software? Then Pick Your Battles
April 1, 2008 - 10:53pm — bexI don't mind when James throws daggers at me about security... because
- his aim sucks (just ask his hunting buddies), and
- hey, free cutlery!
Seriously, I believe we agree on several points... we just have different perspectives.
My point was that creating secure software is extremely difficult... even if you educate your developers about the OWASP top ten (which ain't all that great anyway), and even if you religiously use tools like Ounce Labs or Coverity, you'll always have problems. Those tricks are good checks against developers making brain dead stupid decisions, but they'll never catch the subtler security problems.
The issue is one of complexity... the vast majority of security holes occur in the interfaces between applications and/or concerns. This doesn't just mean cross-site scripting vulnerabilities on the web interface, nor just the sql-injection attacks on the back end... it also includes any time you connect two code bases together in new and novel ways. The very nature of service-oriented architectures and modular code bases exponentially increases the number of things that can go wrong. Even a security-savvy developer that runs Coverity would never have enough time to test every possible permutation... nobody is willing to wait that long for the test cycle to complete, nor would anybody be willing to pay for it.
Thus, some problems will never be noticed until they are "in the wild."
You can yell and scream all you want... but this doesn't change the basic math. Again, don't just listen to me... check out security guru Bruce Schneier and his essay on why insecure products will always win in the marketplace. Its basic economics, called the "market for lemons," which I've covered before.
James seems to need some kind of evidence that the code is at least reasonably safe before putting it into production. Fair enough, but his suggestions suck. I can't think of one single certification that I would be personally willing to trust... penetration tests are OK but flawed. Developer certification courses only teach the basics, and are generally useless. Stamps of approval by "security experts" are nice, but as I've mentioned before, I've found problems that these self proclaimed "experts" missed.
In short, all of James's proposed solutions are false senses of security. Rely on them at your peril. If he's got a new one I've missed, I'm all ears.
You will always need to patch your applications. Accept it. You will never have a "100% secure" system. Accept it. The best you can hope for is something that gets more and more "defensible" as it matures. Accept it. Patches are a necessary evil. Accept it.
Instead of fighting the security battle -- which you will never win -- pick a battle that will both make your life easier, and have better security as a byproduct. Demand that your vendor:
- minimizes the number of required security patches, either through bundling or by educating their developers about security,
- thoroughly tests those patches to minimize the side-effects, and
- has excellent tools to help deploy, test, and roll-back patches if needed
That's probably the best you can hope for...
UPDATE: James responds, and I continue the dialog.
Take The Oracle Security Survey
March 28, 2008 - 8:28am — bexDo you use Stellent, or any Oracle technology? Then you should probably take the IOUG Oracle Security Survey:
Select the OSSA Security Survey, and let 'er rip! It's sponsored by Oracle and the Independent Oracle Users Group. The goal is to gather information about your security practices including general processes for vulnerability and patch management, Critical Patch Updates, and the like. IOUG will analyze the results, and issue recommendations to Oracle at Oracle's next Security Customer Advisory Council. IOUG has release a security podcast to explain more about the survey.
I was shocked to discover that fewer than 20% of Oracle customers admit to applying the rolling security patches that Oracle releases... yikes. Back when I was a developer, I always found it extremely frustrating that customers rarely applied patches to known security holes... CERT often says that 99% of security breaches are due to users not applying patches. In other words, 80% of Oracle customers choose to make themselves vulnerable to 99% of the attacks.
WHY???
Unlike James McGovern, I don't believe security problems are entirely due to bad software or clueless developers... I'd argue most security problems are due to improperly configured and improperly maintained software. However, I also believe that blaming the implementation team is a cop-out. Instead, developers need to realize that security is a process, not a product (hat tip Schneier).
Thus, the best thing a developer can do for security is focus on software that can effortlessly evolve to meet tomorrow's security challenges. If you want secure applications, first demand software that is effortless to patch and maintain. This includes software that can easily roll-back patches in case the security fix broke something important... Then fewer people would fear installing the patches, more would use the existing patches, and there would be significantly fewer breaches.
If software were easy to configure and maintain, then security would get better and better the longer you owned it... not to mention you'd have fewer bugs, and generally better software. Stable products are always more secure. Why? If the product is rock solid, with few bugs, then people are less risk-averse to applying critical patches. Better documentation helps as well, as do better patch tools...
With easy patching, easy maintainability, stable software, and a vigilant community, security is a natural by-product. Also, this helps security becomes less of a cost-center... easy patching and configuration is great for ROI, no matter what.
It Just Makes Sense©, so don't expect too many people to press for it any time soon...
Although relatively speaking, I'm pretty impressed with Oracle's patch technology. The new 11g database watches for errors, and can notify you about patches that might fix the problem. Likewise, the Content Management team has a pretty good patch process... unfortunately, it takes forever to get anything out to Metalink, so your best bet is to always contact support for the latest patches.







Recent comments
15 weeks 3 hours ago
15 weeks 17 hours ago
16 weeks 4 days ago
16 weeks 6 days ago
20 weeks 20 hours ago
21 weeks 2 days ago
22 weeks 1 day ago
23 weeks 6 days ago
25 weeks 6 days ago
26 weeks 4 hours ago