Issues relating to computer security

Security

These articles are about technology, and its effects of security and privacy. Sometimes I post exploits that I find here, other times I warn of issues and possible exploits. I'm not malicious! Merely curious...

Deep Dive: Oracle WebCenter Tips and Traps!

I'm currently at IOUG Collaborate 2014 in Las Vegas, and I recently finished my 2-hour deep dive into WebCenter. I collected a bunch of tips & tricks in 5 different areas: metadata, contribution, consumption, security, and integrations:


As usual, a lot of good presentations this year, but the Collaborate Mobile App makes it a bit tough to find them...

Bezzotech will be at booth 1350, right by Oracle, be sure to swing by and register for a free iPad, or even a free consulting engagement!

Mashup Standards Part 3: JSONP versus CORS

In part 1 of this post, I covered the JSON-P "standard" for mashups. Not so much a standard per se, but a sneaky way to share JSON code between servers by wrapping them in a 'callback' function... For example, if we have our raw JSON data at this URL:

http://example.com/data.js

A direct access would return the raw data dump in JSON format:

{ foo: "FOO", bar: "BAR" }

However, a JSON-P call would return a JavaScript file, that calls a 'callback' function with the raw data:

callback({ foo: "FOO", bar: "BAR" });

Since this is pure JavaScript, we can use it to bypass the "Same-Origin Policy" for AJAX... A typical AJAX call uses the XmlHttpRequest object, which only allows calls back to the originating server... which, of course, means true mashups are impossible. JSON-P is one of the (many) ways around this limitation.

Since JSON-P is something of a hack, many developers started looking for a more secure standard for sharing JSON and XML resources between web sites. They came up with Cross-Origin Resource Sharing, or CORS for short. Enabling CORS is as simple as passing this HTTP header in your XML/JSON resources:

Access-Control-Allow-Origin: *

Then, any website on the planet would be able to access your XML/JSON resources using the standard XmlHttpRequest object for AJAX. Despite the fact that I like where CORS is going, and see it as the future, I just cannot recommend CORS at this point.

Security

Since CORS is built on top of the XmlHttpRequest object, it has much nicer error handling. If the server is down, you can recover from the error and display a message to the user immediately. If you use JSON-P, you can't access the HTTP error code... so you have to roll-your-own error handling. Also, since CORS is a standard, it's pretty easy to just put a HTTP header in all your responses to enable it.

My big problem with CORS comes from the fact that it just doesn't seem that well supported yet... Only modern browsers understand it, and cross-domain authentication seems to be a bit broken everywhere. If you wanted to get secure or personalized JSON on a mashup, your back-end applications will need to also set this HTTP header:

Access-Control-Allow-Credentials: true

And, in theory, the AJAX request will pass along your credentials, and get back personalized data. The 1.7 jQuery plug-ins works well with JSON-P and authentication, but chokes badly on CORS. Also, keep in mind that authenticated CORS is a royal pain in Internet Explorer. Your end users will have to lower their security setting for the entire mashup application in order to make authenticated requests.

Now, JSON-P isn't great with security, either. Whereas CORS is too restrictive, JSON-P is too permissive. If you enable JSON-P, then you pass auth credentials to the back-end server with every request. This may not be a concern for public content, but if an evil web site can trick you into going to their mashup instead of your normal mashup, they can steal information with your credentials. This is call Cross-Site Request Forgery, and is a a general security problem with Web 2.0 applications... and JSON-P is one more way to take advantage of any security holes you may have.

Performance

In addition, the whole CORS process seems a bit 'chatty.' Whereas JSON-P requires one HTTP request to get secure data, CORS requires three requests. For example, assume we had two CORS enabled applications (app1 and app2) and we'd like to blend the data together on a mashup. Here's the process for connecting to app1 via CORS and AJAX:

  1. Pre-Flight Request: round-trip from client browser to app1 as a HTTP 'OPTIONS' request, to see if CORS is enabled between mashup and app1
  2. Request: if CORS is enabled, the browser then sends a request to app1, which sends back an 'access denied' response.
  3. Authenticated Request: if cross-origin authentication is enabled, data is sent a third time, along with the proper auth headers, and hopefully a real response comes back!

That's three HTTP requests for CORS compared to one by JSON-P. Also, there's a lot of magic in step 3: will it send back all the auth headers? What about cookies? There are ways to speed up the process, including a whole ton of good ideas for CORS extensions, but these appear to be currently unpopular.

Conclusion: Use JSON-P With Seatbelts

If all you care about is public content, then CORS will work fine. Also, it's a 5-minute configuration setting on your web server... so it's a breeze to turn on and let your users create mashups at their leisure. If you don't create the mashups yourself, this is sufficient.

However... if you wish to do anything remotely interesting or complex, JSON-P has much more power, and fewer restrictions. But, for security reasons, on the server side I'd recommend a few safety features:

  • Validate the HTTP_REFERER: only allow JSON-P requests from trusted mashup servers, to minimize request forgery.
  • Make JSON-P requests read-only: don't allow create/modify/delete through JSON-P.

But wait, isn't it easy to spoof the HTTP referrer? Yes, an evil client can spoof the value of the referrer, but not an evil server. In order for an evil mashup to spoof the referer, he'd have to trick the innocent user to download and run a signed Applet , or something similar. This is a typical trojan horse attack, and if you fall for it, you got bigger problems that fancy AJAX attack vectors... DNS rebinding is much more dangerous, and is possible with any AJAX application: regardless of JSON-P or CORS support.

Links and Free Downloads

For those of you interested in Oracle WebCenter, I created a CrossDomainJson component that enables both CORS and JSON-P, and it includes some sample code and documentation for how to use it. It currently works with WebCenter Content, but I might expand it to include WebCenter Spaces, if I see any interest.

Presentations From Collaborate 2011

In case you weren't able to make it to IOUG Collaborate last month, you can feel free to peruse my presentations in the privacy of your own home ;-)

My first one was on UCM implementation patterns... or in general, what customizations/integrations are common for UCM, and how do we do them? That was pretty well attended:

I also presented on the Top 10 Security Vulnerabilities in web applications. This is my own take on the popular OWASP Top Ten presentations on the same subject. Many thanks to the OWASP people for compiling the top ten, and getting the word out about security:

In addition to these two, I gave presentations on managing a multi-language web site, and a fourth one on the next generation of Oracle Collaboration Tools, also known as Oracle Fusion UX Applications. Oracle was kind enough to give me a sneak peek at Fusion UX, and I was quite impressed, and volunteered to help spread the word.

Enjoy!

Oracle UCM Security: Challenges and Best Practices

I recently gave a security talk at the Minnesota Stellent User's Group... Stellent of course being the old name for Oracle Universal Content Management. I uploaded it to Slideshare, and embedded it below:

This talk is a variation on a talk I gave at Crescendo a few years back... it covers the security risks and vulnerabilities inside Oracle UCM, and countermeasures to prevent break-ins. This talk is not a how-to for integrating LDAP, Active Directory or Single Sign On... rather it's intended to be an introduction to cross site scripting, SQL injection, and other common web application attack vectors. It's a bit scary for a while, but then it tells you how to prevent attacks.

Enjoy! And don't be evil...

The Deep, Dark, Secret Origin Of Oracle UCM's Security Model

On a recent blog post about Oracle UCM -- Should Oracle Be On Your Web Content Management Short List? -- CMS Watch analyst Kas Thomas commented that he thought Oracle's security model was a bit spooky. He admitted that this may be because he didn't know enough about it: his concern stemmed from an overly stern warning in Oracle's documentation.

Alan Baer from Oracle soothed his fears and said that the documentation needed a bit of work... The documentation mentioned that changing the security model might cause data loss, which is in no way true. It should say that changing the security model might cause the perception of data loss, when in fact the repository is perfectly fine... the problem is that when you make some kinds of changes to the security model, you'll need to update the security settings of all your users so they can access their content.

Nevertheless, I thought it might be a good idea to explain why Oracle UCM's security model is how it is...

Back in the mid 1990s when UCM was first designed, it had a very basic security model. It was the first web-based content management system, so we were initially happy just to get stuff online! But immediately after that first milestone, the team had to make a tough decision on how to design the security model. We needed to get it right, because we would probably be stuck with it for a long time.

  1. Should it be a clone of other content management systems, which had access-control lists?
  2. Should it be a clone of the unix file permissions, with directory and file based ownership?
  3. Or, should it be something completely different?

As with many things, the dev team went with door number 3...

Unix file permissions were simply not flexible enough to manage documents that were "owned" by multiple people and teams. The directory model was compelling, but we needed something more.

Access Control Lists (ACLs) are certainly powerful and flexible, because you store who (Bob, Joe) gets what rights (read, delete) to which documents. The ACLs are set by the content contributors when they submit content. However, ACLs are horribly slow and impossible to administer. For example, I as an administrator have very little control over how you as a user set up your access control lists. Let's say some kinds of content are so important that I want Bob to always have access, but Joe never gets access. If Bob gets to set the ACLs on check-in, then there's a risk he gives Joe access. It's tough to solve this problem in any real way without a bazillion rules and exceptions that are impossible to maintain or audit.

Instead, the team decided to design their security model with seven primary parts:

  • SECURITY GROUPS are like a classification of a piece of content. Think: restricted, classified, secret, top secret, etc. As Jay mentioned in the comments, these are groups of content items, and not groups of users.
  • ACCOUNTS are like the directory location of where a content item resides in a security hierarchy. Think: HR, R&D, London offices, London HR, etc. These are typically department-oriented, but its also easy to make cross-departmental task-specific accounts for special projects.
  • DOCUMENTS are required to have one and only one security group. Accounts are optional. This information is stored with the metadata of the document (title, author, creation date, etc.) in the database.
  • PERMISSIONS are rules about what kind of access is available to a document. You could have read-access-to-Top-Secret-documents, or delete-access-to-HR-documents. If the document is in an account, then the user's access is the union intersection of account and group permissions. For example, if you had read access to the Top Secret group, and read access to only the HR account, you'd be able to read Top-Secret-HR content. However, you would not see Top-Secret-R&D content.
  • ROLES are collections of security group permissions, so that they are easier to administer. For example, a contributor role would probably have read and write access to certain kinds of documents, whereas the admin role would have total control over all documents. Change the role, and you change the rights of all users with that role.
  • USERS are given roles, which grants them different kinds of access to different kinds of documents. They can also be granted account access.
  • SERVICES are defined with a hard-coded access level. So a "search" service would require "read" access to a document, otherwise it won't be displayed to the user. A "contribution" service would require that the user have "write" access to the specific group and account, otherwise you will get an access denied error.

This kind of security model has many advantages... firstly, it is easy to maintain. Just give a user a collection of roles, and say what department they are in, and then they should have access to all the content needed to do their job. It works very well with how LDAP and Active Directory grant "permissions" to users. That's why it is usually a minimal amount of effort to integrate Oracle UCM with existing identity management solutions.

Secondly, this model scales very well. It is very, very fast to determine if a user has rights to perform a specific action, even if you need to do a security check on thousands of content items. For example, when somebody searches for "documents with 'foo' in the title," all the content server needs to do is append a security clause to the query. For a "guest" user, the query becomes "documents with 'foo' in the title AND in the security group 'Public'." Simple, scalable, and fast.

There are, of course, dozens of ways to enhance this model with add-on components... The optional "Collaboration Server" add-on includes ACLs, along with the obligatory documentation on how ACLs don't scale as well as the standard security model... The optional "Need To Know" component opens up the security a bit to let people to see some parts of a content item, but not all. For example, they could see the title and date of the "Hydrogen Bomb Blueprints" document, but they would not be able to download the document. The "Records Management" component adds a whole bunch of new permissions, such a "create record" and "freeze record." I've written some even weirder customizations before... they aren't much effort, and are very solid.

I asked Sam White if he could do it all over again, would he do it the same? For the most part, he said yes. Although he'd probably change the terminology a bit -- "classification" instead of "role," "directory" instead of "account." In other words, he'd make it follow the LDAP terminology and conventions as closely as possible... so it would be even easier to administer.

I do think it is a testament to the skills of the UCM team that the security model so closely mirrors how LDAP security is organized... considering LDAP was designed over many years by an international team of highly experienced security nerds. I'm also happy when it gets the "thumbs-up" from very smart, very paranoid, federal government agencies...

Oracle Buys Sun: Insert your own "Java Garbage Collector" Pun

In case you haven't heard, Oracle bought Sun... after being teased by IBM, and watching its stock price plummet, Oracle began talks with Sun last Thursday about possible acquisition...

If you were surprised, don't feel bad... Neither IBM nor Microsoft had a clue this was going to happen.

First thoughts... holy crap! Oracle sure saved Sun from becoming a part of the IBM beast... and now Oracle (more or less) owns Java, and has access to all those developers who maintain it. This is win-win for them both, in my opinion. Sun gets most of their revenue from hardware, which Oracle avoided doing for decades, so overall there's not much overlap in product offerings -- unlike last year's BEA acquisition.

The hardware-software blend is a compelling story... Imagine getting all your Oracle applications and databases pre-installed on a hardware appliance! Not bad... You could even get one of them data centers in a box, slap a bunch of Coherence nodes on each, and have a plug-and-play "cloud computer" of your very own.

Second thoughts... how the heck is the software integration plan going to work? Sun helps direct a lot of open source projects... including JRuby, Open Office, and the MySQL database... not to mention the OpenSSO identity management solution, and the GlassFish portal/enterprise service bus/web stack. The last two are award winning open-source competitors to existing Oracle Fusion Middleware products. Oracle now owns at least 5 portals, and at least 4 identity management solutions... unlike past acquisitions, existing Oracle product lines are going to have to justify themselves against free competitors. I can foresee a lot of uneasy conversations along the lines of:

So, Product Manager Bob... I notice that your team costs the company a lot of money, but your product line isn't even as profitable as the stuff we give away for free... Can you help me out with the logic here?

There are a lot of open source developers shaking in their boots over this... but I'm being cautiously optimistic. Oracle can't "kill" MySQL: there are too many "forked" versions of MySQL already, any one could thrive if Oracle tried to cripple the major player. Likely they will simply try to profit from those who choose to use a bargain brand database. Case in point, Oracle could sell them their InnoDB product, which allows MySQL to actually perform transactions.

Middleware is the big question mark... but with a huge injection of open source developers, products, and ideas, I'm again cautiously optimistic that -- after an inevitable shake-up -- the Middleware offerings would improve tremendously.

And Open World 2009 is going to be a lot more crowded...

Popularity of the Web Considered Harmful

In a recent TED Talk, Tim Berners Lee laid out his next vision for the world wide web... something he likes to call "Linked Data." Instead of putting fairly unstructured documents on the web, we should also put highly structured raw data on the web. This data would have relationships with other pieces of data on the web, and these relationships would be described by having data files "link" to each other with URLs.

This sounds similar to his previous vision, which he called the "semantic web," but the "linked data" web sounds a bit more practical. This change of focus is good, because as I covered before, a "true" semantic web is at best impractical, and at worst impossible. However, just as before, I really don't think he's thought this one through...

The talk is up on the on the TED conference page if you'd like to see it. As is typical of all his speeches, the first 5 minutes is him tooting his own horn...

  • Ever heard of the web? Yeah, I did that.
  • I I I.
  • Me me me.
  • The grass roots movement helped, but let's keep talking about me.
  • I also invented shoes.

I'll address his idea of Linked Data next week -- preview: I don't think it will work. -- but I first need to get this off my chest. No one single person toiled in obscurity and "invented the web." I really wish he would stop making this claim, and stop fostering this "web worship" about how the entire internet should be the web... because its actually hurting innovation.

Let's be clear: Tim took one guy's invention (hypertext) and combined it with another guy's invention (file sharing) by extending another guy's invention (network protocols). Most of the cool ideas in hypertext -- two-way links, and managing broken links -- were too hard to do over a network, so he just punted and said 404! In addition, the entire system would have languished in obscurity without yet another guy's invention (the web browser). There are many people more important than Tim who laid the groundwork for what we now call "the web," and he just makes himself look foolish and petty for not giving them credit. Tim's HTTP protocol was just an natural extension of other people's inventions that were truly innovative.

Now, Tim did invent the URL -- which is cool, but again, hardly innovative. Anybody who has seen an email address would be familiar with the utility of a "uniform resource identifier." And as I noted before, URLs are kind of backwards, so its not like he totally nailed the problem.

As Ken says... anybody who claims to have "invented the web" is delusional. Its would be exactly like if a guy 2000 years ago asked: "wouldn't it be great if we could get lots of water from the lake, to the center of the town?" And then claimed to have invented the aqueduct.

As Alec says... the early 90s was an amazing time for software. There was so much computing power in the hands of so many people, all of whom understood the importance of making data transfer easier for the average person... Every data transfer protocol was slightly better than the last, and more kept coming every day. It was only a matter of time until some minor improvement on existing protocols was simple enough to create a critical mass of adoption. The web was one such system... along with email and instant messaging.

Case in point: any geek graduate of the University of Minnesota would know that the Gopher hyperlinking protocol pre-dated HTTP by several years. It was based on FTP, and the Gopher client had easily clickable links to other Gopher documents. It failed to gain popularity because it imposed a rigid file format and folder structure... plus Minnesota shot themselves in the foot by demanding royalty fees from other Universities just when HTTP became available. So HTTP exploded in popularity, while Gopher stagnated and never improved.

But, the popularity of the web is a double-edge sword. Sure, it helps people collaborate and communicate, enabling faster innovation in business. But ironically, the popularity of the web is hurting new innovation on the internet itself. Too much attention is paid to it, and better protocols get little attention... and the process for modifying HTTP is so damn political, good luck making it better.

For example... most companies love to firewall everything they can, so people can't run interesting file sharing applications. It wasn't always like this... because data transfer was less common, network guys used to run all kinds of things that synced data and transferred files. But, as the web because much more popular, threats became more common, and network security was overwhelmed. They started blocking applications with firewalls, and emails with ZIP attachments just to lessen their workload... But they couldn't possibly block the web! So they left it open.

This is a false sense of security, because people will figure ways around it. Its standard hacker handbook stuff: just channel all your data through port 80, and limp along with the limitations. These are the folks who can tunnel NFS through DNS... they'll find their way through the web to get at your data.

What else could possibly explain the existence of WebDAV, CalDAV, RSS, SOAP, and REST? They sure as hell aren't the best way for two machines to communicate... not by a long shot. And they certainly open up new attack vectors... but people use them because port 80 is never blocked by the firewall, and they are making the best of the situation. As Bruce Schneier said, "SOAP is designed as a firewall friendly protocol, which is like a skull friendly bullet." If it weren't for the popularity of the web, maybe people would think harder about solving the secure server-to-server communication problem... but now we're stuck.

All this "web worship" is nothing more than the fallacy of assuming something is good just because it's popular. Yes, the web is good... but not because of the technology; it's good because of how people use it to share information... and frankly, if Tim never invented the web, no big loss; we'd probably be using something much better instead... but now we're stuck. We can call it Web 2.0 to make you feel better, but it's nowhere near the overhaul of web protocols that are so badly needed... Its a bundle of work-arounds that Microsoft and Netscape and open source developers bolted on to Web 1.0 to make it suck less... and now it too is reaching a critical mass. Lucky us: we'll be stuck with that as well.

What would this "better protocol" be like? Well... it would probably be able to transfer large files reliably. Imagine that! It would also be able to transfer lots and lots of little files without round-trip latency issues. It would also support streaming media. It would have built-in distributed identity management. It would also support some kind of messaging, so instead of "pulling" a site's RSS feed a million times per day, you'd get "pushed" an alert when something changes. Maybe it would have some "quality of service" options. Most importantly, it would allow bandwidth sharing for small sites with popular content, to improve the reach of large niche data.

All these technologies already exist in popular protocols... but they are not in "the web." All of these technologies are likewise critical for anything like Tim's "Linked Data" vision to be even remotely practical. All things being equal, the web is almost certainly the WORST way to achieve a giant system of linked data. Just because you can do it over the web, that doesn't mean you should. But again... we're stuck with the web... so we'll probably have to limp along, as always. Developers are accustomed to legacy systems... we'll make it work somehow.

Now that I've gotten that out of my system, I'll be able to do a more objective analysis of "Linked Data" next week.

How Bad Web Security Makes It Easy To Rick-Roll

You've probably heard about the technique of Rick Rolling... its basically the web version of the oh-so-mature "made you look" game. You tell people that a link goes to some interesting info, when if fact the link goes to a YouTube video of Rick Astley singing "Never Gonna Give You Up." It's also lead to the trend of live Rick Rolling, in where you trick somebody to look at the lyrics of the song... like what happened during the 2008 Vice Presidential Debates.

Well, now people are so suspicious of YouTube links, they won't click on them anymore. So the answer is to raise the bar a little. My technique is to use open redirects from legitimate websites to hide links to YouTube!

For example... see the link below to Yelp.com? Where do you think it goes? Cut and paste it into a browser URL to see where it actually goes:

http://www.yelp.com/redir?storeId=&url=%68%74%74%70%3a%2f%2f%77%77%77%2e%79%6f%75%74%75%62%65%2e%63%6f%6d%2f%77%61%74%63%68%3f%76%3d%59%75%5f%6d%6f%69%61%2d%6f%56%49

It looks like a link to Yelp.com, which is a restaurant review site... but with a little URL magic, you can force Yelp to annoy people. Naturally, once Yelp catches wind of this, they will shut down the open redirect pretty fast, so you have to keep looking for more. The technique is pretty simple:

  • Find a large/important site that links frequently to small/unimportant sites... such sites usually have open redirects.
  • Poke around and see if you can spot any URLs that look like they might be redirects... the URLs might have parameters like url=http://example.com, redirect=example.com, or something similar.
  • Copy one of these redirect URLs into your address bar
  • In the site URL, replace the redirect URL parameter with a Rick Rolling URL -- such as http://www.youtube.com/watch?v=Yu_moia-oVI -- and see if the site redirects to YouTube.
  • For advanced Rick-Rolling, you might want to disguise the link to YouTube by URL encoding it. Use the form below to obfuscate a URL parameter:
Normal Text:
URL Encoded Text:

You may now Rick-Roll with impunity...

Why do these open redirects exist? Simple: to prevent SPAM blogs. This problem was big on Amazon.com, because at first they allowed people to submit links in comments. However, that meant that folks could link back to SPAM sites from Amazon.com. This is bad enough, but when Google noticed that Amazon linked to a site, its page rank and "relevance" would increase... meaning those awful SPAM sites would have a higher rank in Google search results. There were many proposals to combat this problem... but the only one that completely solves it is to do a redirect from Amazon.com itself.

This does help the battle against SPAM, but unless you do it right its a major security hole... people would see a link that goes to Amazon.com, then click on it, but then get hijacked to an evil site. The URLs look completely legit, and they bypass most SPAM/SCAM filters. These are particularly useful for people who use the phishing technique to steal bank account numbers, credit card numbers, and the like. Back in 2006 I found these security holes on Google, Amazon, MSN, and AOL. I alerted them all to the bug; some of them fixed it... however more sites every day make this same error. I'm hoping that broadcasting this technique to Rick Rollers might do some good... that way, Rick Rollers will find these security holes on new sites before hackers, cracker, and phishers do.

Basically, I'm betting that the annoying outnumber the evil... Let's hope I'm right...

ECM Standards War: Bye Bye JSR170, Hello CMIS!

I've been wanting to talk about this for a while... but I was under a NDA... but now I can FINALLY tap-dance on the grave of that awful JSR170 standard... and only mere months after Oracle released their adapter. Sorry Dave...

CMS Watch is now reporting the on yet another ECM standard, this one named the Content Management Interoperability Services specification, or CMIS for short. Oracle is a member of the community that is helping design this spec, and I have high hopes for it... unlike the prior ones.

I never liked the JCR specs (JSR 170 and JSR 283). Firstly, the world doesn't need a Java based content management spec. That's just plain stupid. Any spec that by design omits SharePoint will be a non-starter. Also, the whole JCR stuff is an API spec. We don't need an API spec: we need a protocol spec.

That's where CMIS comes it. Its a REST-ful protocol for getting at your data, and changing individual resources... but it also comes with JSON and SOAP fused in there a bit... cuz frankly, we need the extra oomph.

At its heart is the Atom Publishing Protocol. Now... I have some issues with this, because I feel APP isn't robust enough for large scale syndication. There simply is no guarantee of quality of service when you're using "feeds", and polling-based architectures simply don't scale to thousands of enterprise applications. That's the dirty little secret that ReST fanboys don't want you to find out...

Many folks in the open source community have already noticed this, and advocate using the instant messaging protocol XMPP (aka Jabber) to "wrap" restful web services in a bundle. That's the best of both worlds: a simple protocol that's easy to understand, but with a wrapper that can guarantee your published document actually got to where it was supposed to go... Others advocate that enterprises should use a more proven general-purpose messaging protocol like Apache ActiveMQ, instead of the IM-centric Jabber one.

In any case, the CMIS standard is only at version 0.5. Early adopters will notice issues, and the spec will have to evolve to meet the performance and quality concerns... CMIS alone would be great for making lightweight mashups... but I anticipate the ActiveMQ qrapper will become a standard best-practice for heavy duty publishing and syndication.

I promise to have more on this stuff later...

Who Wants to be @OWASP on Twitter?

So I've been following the Open Web Application Security Project (OWASP) project for some time... I was just reading about the next few Minneapolis OWASP meetings, and was kind of shocked to see Richard Stallman on the list of speakers for the October event...

Anyway, as I wandered over to Twitter, I noticed that nobody claimed the OWASP account yet! So I swiped it... Even tho I'm not really a hard core OWASP guy. I don't run a chapter, nor is web app security my job... its more like a hobby ;-)

So basically, I was wondering who "deserves" to have this Twitter account? I have no clue what to do with it. Meeting announcements? Full disclosure tweets? Open mockery of hacked applications?

(hat tip: Sam)

How Many Hits Does Your Site REALLY Get?

Its been two years since my inaugural blog post on April 29th, 2006: The Trouble With RSS. Over my site's second year, I wanted to do some long-term analysis on how different web analytics tools track hits, visits, and the like. As expected, they don't agree with each other:

  • SiteMeter: 89,800 visits (132,000 hits)
  • Google Analytics: 84,000 visits (140,000 hits)
  • Webalizer: 431,000 visits (3,660,000 hits)

Curious about why web site statistics differ based on the tool? SiteMeter uses an embedded image (at the bottom of this page), and tracks a hit every time somebody loads the image... so if you block banner ads, your visit might not be recorded. Google Analytics loads some JavaScript, which is useful for tracking more complete data... but if your browser blocks JavaScript (or cross-domain JavaScript), it wont register a hit. I found it odd that SiteMeter tracked more visits, but fewer hits than Google Analytics... curious.

In contrast with the other two, Webalizer uses raw Apache logs to determine hit count, so it tracks every single dang hit... Over 3 million hits in one year??? That's clearly too many... I'm not that interesting... but the visit count might be more accurate. Webalizer is the only analytics tool that tracks folks who view my site with RSS Readers, which may hit my site several times per day... thus the higher visit count. The hit count is hyper inflated because it counts search engine spiders, spammers, and hack attempts (some better than others).

All told, if the majority of folks view my site with RSS, then Webalizer's count is more accurate. If most of them view it the old fashioned way, then the other two are more accurate. I'm probably in the 100,000 - 200,000 visits per year range.

Unfortunately, none of these numbers include the folks who read my site through an online RSS readers, like Google Reader, or Bloglines. These sites hit my RSS feed once, then share it with dozens of folks who subscribe to the feed... To get a better estimate, I could pipe my RSS Feed through something like Feedburner. Feedburner keeps track of how many subscribers you have on the online feed readers, and produces decent stats on it... however, once you move your feed to Feedburner, its almost impossible to move it out... so I'm not happy with that option. Even so, that still wouldn't track those who view my content through RSS aggregators like Central Standard Tech, or Orana, or other sites that run Planet.

Well, what about the data from Alexa? That site ranks web pages based on those who surf the web with a toolbar that tracks their every move. Personally, I think people who surf with that toolbar are opening up a major security hole... so their viewing audience is probably restricted to folks who are kind of tech savvy, but don't take security precautions. In other words, newbie geeks. I've never broken into the top 100,000 sites ranked on Alexa, but I frequently break the top 100,000 sites ranked by Technorati... although Technorati only ranks blogs.

UPDATE: As Phil noted in the comments below, most people use Alexa just to boost their own page rank. For example, you could have your web team install and enable the Alexa toolbar, but only when browsing you own web page. That would make your Alexa rank huge without any actual hits from the greater internet...

Even if we could accurately count how many people hit the site, we're still at a loss to know who paid attention. Google Analytics tries to measure "time on the page", other metrics include bounce rate, or even the number of comments.

Oh well... A reliable measure of relevance will always be elusive... but at least we have enough estimates to support a cottage industry of people analyzing those metrics to prove anything they are told to prove ;-).

Back to my anniversary... Lots of stuff has changed since my first anniversary post: I've traveled to South Africa, Brazil, and Argentina... I've remodeled my kitchen, I've nearly completed my second book on Oracle enterprise content management, I've given technology presentations at Oracle Open World, AIIM Minnesota, BarCamp Minnesota, and IOUG Collaborate in Denver. I've trained both salespeople and consultants on what Enterprise Content Management actually is, and I helped negotiate a settlement to an 18-month lawsuit against a local non-profit. Oh yeah... I implemented about a dozen ECM solutions as well...

Next year, I hope to have even more goin' on... and a few more web site visits.

Oracle Universal Online Archive: The "Killer App" for Oracle Secure Files

When I first heard about Oracle taking a new direction with their old content management product -- meaning the old Content DB, not the newly acquired Stellent stuff -- the first thing I thought was it's about time!

When Oracle claimed it had 2 content management systems, that really confused people... especially considering that Content DB was at best a set of tools to create a content management system, whereas Stellent was a full blown application plus framework. They really weren't like each other at all.

Universal Online Archive (UOA) is Content DB, but now focused on being an archiving platform. On Oracle 11g, it is an extension on the Secure Files feature of the database. If you haven't heard of Secure Files yet, it beats the Linux filesystem on both read and write performance. It also has compression, de-duplication (only storing duplicate files once), and encryption. The encryption is an extension of Oracle Transparent Data Encryption, plus support for encrypting entire tablespaces instead of just individual columns. This means support for foreign keys, as well as indexes beyond the basic b-tree stuff...

Compression reduces the storage needs by 33% on average, according to Oracle. If you then use the statistics from IDC that there are 8 copies for every 1 content item, then de-duplication would bring to total storage down by 87.5%... all while maintaining better-than-filesystem performance, despite the added cost of encryption. See this whitepaper for some tuning statistics and tips.

Secure Files is the next generation of Large Objects for the database... and it's very cool... but what should you run on top of it? For the longest time, the folks at Stellent balked at using the database for file storage. Using the filesystem made much more sense because of performance reasons, which made up for the additional complexity of the architecture. However, if the user has 11g, there really is no better option than storing content items in the database.

NOTE: This rule-of-thumb does not apply for web content -- especially for small images and thumbnails. In those cases, a split approach where public web assets are stored locally would probably be faster. Luckily, a customized FileStoreProvider can help you achieve this.

Also, Oracle Universal Online Archive finally fits in with Oracle's broader strategy for content management. Even though it can store anything, the first release will have connectors to email servers to be a mail archive:

  • Microsoft Exchange
  • Lotus Notes
  • Generic SMTP Server

This fits right in with the Universal Records Management strategy, which is to embed a Records Management Agent in remote repositories, and control their life cycle from the Records Management system.

In other words, your email archiving policy is no longer dictated by IT. Your records managers can say when an item should be archived, and how long it should be retained based on events, instead of simply time and size constraints. For example, emails should be retained 2 years after a project completion, 6 months after employee termination, or 12 months after you lose a specific customer. That will reduce both your email space requirements, and your legal risk.

But it doesn't stop there... the next step is to make connectors to other content management systems, for example, Sharepoint. The idea is to archive content out of systems like Sharepoint, and replace them with a "stub". When a user downloads from Sharepoint, the "stub" is smart enough to redirect the download to the archive, and return it directly.

In other words, you could be using a secure, compressed, de-duplicated, encrypted, archive without ever noticing. Throw in a Records Management Agent, and you'll also invisibly comply with dozens of regulation and laws... no matter where you store your information.

Its a good strategy, and some interesting technology... we'll see how it pans out.

UPDATE: The release was announced, but they don't have a date for when it will be available for download. Here's some more info about the release, and some places to watch for downloads:

Search Google for Terrible Developers

Hat tip Reddit:

http://www.google.com/search?q=inurl:SELECT+inurl:FROM+inurl:WHERE+intitle:phpmyadmin

Almost 1000 hits... yikes. Trust me, its funny (and sad) if you know SQL injection...

Oracle Unbreakable ECM?

After my security posts last week (here, here, and here), I got an interesting email from an Oracle partner out west (David Roe from Ironworks)... one of his customers put Stellent though a battery of automated security tests, and got some surprising results:

Incidentally one of our clients ran through a couple rounds of automated security testing on their UCM instance. They sort of surprised us with it actually, but when they were done sent back some great feedback about how strong the system was and how it passed every check (apparently an uncommon occurrence). I personally don't put a lot of faith in any automated testing, but it's nice to know Stellent will pass one :)

Like the author, I don't put that much faith in automated tests... but many of these security testing companies are batting 1000: some of these firms brag that they always find security holes, but this time they came up empty. Even on an unannounced, surprise, security audit.

Naturally, neither David no myself will reveal the name of the customer... because bragging about an unbreakable system is the surest way to attract the wrong attention... but if a legitimate analyst or existing Oracle customer would like to chat with these folks, Dave could facilitate a connection.

Google Gears Plus Greasemonkey: I Got Your Web 3.0 Right Here, Pal...

On April 1st, Google announced that their Google Docs application now works offline.

This is kind of the direction that people have been taking for a while... being able to use Rich Internet Application technology like Adobe AIR to work on web forms, but take them offline for later viewing. However, Google decided to take an oddly different approach.

They decided to use Google Gears, which is a combination of a browser plug-in, a mini web server, and a SQL database. You don't need to use Java or Flash in order to save data to the database, you just use standard JavaScript calls.

Its like AJAX on crack. And if done right, it could break down even more walled gardens than Web 2.0 did.

Currently, Google Gears is only in its 0.2 release: very very very beta. Not like GMail beta, or Google Docs beta... but so beta that maybe they should call it alpha or something. What I found interesting was the possible effect this strategy will have on the rest of Google's applications. Take Spreadsheets offline? How about my Analytics data? Why not GMail? The process would be this:

  • Connect to your Google online app.
  • Use Gears to synchronize your local database with Google data.
  • Take your application offline.
  • Run everything you need by connecting to the Gears web server, and getting back chunks of HTML/XML.

Now... What happens when you add Greasemonkey to the mix?

Greasemonkey is a popular little application that allows you to inject custom HTML and JavaScript into other people's web sites. Do you want an extra link on the home page to take you directly to the latest news? No problem. Don't like the way GMail organizes its buttons? Re-arrange them. Hate the look and feel of a site? Use a custom stylesheet.

Don't like how GMail organizes its back-end data store? Well, too bad, you can't use Greasemonkey to force GMail to store or retrieve your data differently... that is, unless Gmail uses Gears!

If so, I could inject custom code to not only synchronize with my online database, but store it however I want. Previously, Greasemonkey could only access existing content -- provided it was available through AJAX or Remote Scripting. But when combined with Gears, Greasemonkey scripts can perform radical analysis of web content, and store the processed information locally! It can also synchronize back to the main site, for proper online storage...

In effect, Greasemonkey allows end users to inject customized code for web page display... but Greasemonkey plus Gears allows you to inject a whole custom web application! So what??? Well, imagine being able to do this:

  • Use GMail to store up all the email questions and answers on a community group. Use Greasemonkey to keep a running count of who helps answers questions (gurus), versus who just demands answers (leeches)... then avoid helping the leeches.
  • Use a Greasemonkey script to run custom reports based on Google Analytics data, and present it right in the browser.
  • Create an offline Google Spreadsheet with Gears. Then, go to any one of the popular online polling apps (Surveymonkey) or web form designers (Wufoo). Use a Greasemonkey script to access the raw data from the reports, process it, and inject it into a Google Spreadsheet. Sync the offline spreadsheet with Google. and now the report is online for all to see!
  • Transfer information from one site -- say Facebook -- into any other site -- say LinkedIn -- without having to use their proprietary APIs, or let the sites know the password for the other site! Just use a Greasemonkey spider to grab the information, store it locally, and upload it when appropriate.

Naturally... the security risks are profound... If Gears ever got popular, a little JavaScript on an evil site could read much more than just your cookies... So its important to lock down the ability for one site to read another site's database. However, we should probably relax access for things like cross-site Greasemonkey, otherwise we'll miss out on most of the value of Gears.

Will it bring about the next gen of the web? Web 3.0? Web 4.5? Maybe web candle plus monkey? We'll see what happens in Gears 0.3...

UPDATE: Jake had the suggestion that it might be more useful to use Mozilla Prism with Greasemonkey, as opposed to Google Gears. Lifehacker recently profiled Prism. That depends on how this plays out... Prism would work great for Firefox-based rich internet apps... whereas Adobe AIR and Google Gears would be more cross-platform. If you want iPhone support, you'll need Safari. Although at present Prism is more feature complete than Gears.

Overall, I think Google Gears is going in a better direction than AIR or Prism, because they are following the maxim don't break the web!... but time will tell if they can actually deliver.

What Should ECM Apps Do About Security?

James responds... to my latest security rant, with a lot of good points. I think this point here is the best:

Have you ever noodled that as data flows from one system to another within an SOA, but the security model doesn't, that this is another attack vector? For example, what if I have access to data in a policy administration system such that I can figure out if you are insuring an auto that your wife doesn't know about but couldn't do the same in a claims administration system? I bet you can envision scenarios when you integrate a BPM engine with an ECM engine that security becomes weaker.

Absolutely... unfortunately, this is an amazingly difficult problem. Its not really the realm of ECM or BPM to solve it... rather, the best thing that we can do is not get in the way. Let the experts solve that one, and then integrate as well as possible with global policy management systems.

My suggestion is this:

  1. Implement a policy-based security model in your application (ECM/BPM).
  2. Loosely couple your application with an identity management system, so you can access a global security policy.
  3. Place extra hooks -- and allow people to "inject" new hooks -- that allow additional security callbacks to arbitrarily re-validate both credentials and access rights.
  4. Optionally, map the global policy and list of access rights to a policy more relevant to your application, and allow access by "local" users not in the global repository.

Most applications in the Oracle ECM stack follow this methodology... but I can't vouch for all Oracle applications. I like it, because its flexible enough to 'slave' yourself to an identity management system, and yet still have some local control over access rights if you want to 'boost' somebody's credentials.

I think it would be great if Oracle chose to augment this model to add support for a policy auditing standard... but I have no idea if anybody is asking for one, and if so, which one? I'm positive James has an opinion... I'm a fan of just using Business Intelligence to do the reporting, since (again) you can "sneak-in" better security along with the latest buzzword ;-)

Sub-optimal? Of course... but anything that makes security look less like a cost-center is good...

I also like the concept of Oracle's magic black box for identity services. That would make it easier for developers to create policy-based security models, that (in theory) would work with old, new, and emerging standards alike (XACML, CardSpace, OpenID, etc.). It's not that I don't like XACML, its simply that there are other horses in this race... and developers do not have the power to dictate architecture. We can suggest what works best, but in the end, the most sellable product will support them all.

I fully agree that #4 is a possible attack vector, which is why good access auditing and rights auditing tools are important... However, users frequently insist on local control of security rights, because there are many legitimate business cases where it isn't feasible to place all users in a global repository with the proper rights. Sometimes -- especially during mergers and acquisitions -- you want to keep the identities and access rights of these folks as secret as possible. Or, if your IT department has a 3-week waiting period for new users, but you need a contractor NOW for a 2 week project, guess what will happen?

I especially like how Oracle ECM implements #3... some of the more interesting aspects of the future of security involve multiple challenges for access. For example, assume a user has access to both mundane and highly restricted content, but her daily work is usually with the mundane. Now, at 7pm, she's suddenly accessing a ton of highly restricted content. Red flag! Even if her security tokens have not yet expired, a good security system would notice that this behavior is strange, and demand further authentication credentials... maybe the name of her first pet, or the manual-override PIN.

Anyway, Oracle ECM doesn't do any integrations like that as of yet, but it has the flexibility to do it... several identity management systems support that approach, and ECM is being positioned more and more as "infrastructure..." so I'd wager its only a matter of time.

Want Secure Software? Then Pick Your Battles

I don't mind when James throws daggers at me about security... because

  • his aim sucks (just ask his hunting buddies), and
  • hey, free cutlery!

Seriously, I believe we agree on several points... we just have different perspectives.

My point was that creating secure software is extremely difficult... even if you educate your developers about the OWASP top ten (which ain't all that great anyway), and even if you religiously use tools like Ounce Labs or Coverity, you'll always have problems. Those tricks are good checks against developers making brain dead stupid decisions, but they'll never catch the subtler security problems.

The issue is one of complexity... the vast majority of security holes occur in the interfaces between applications and/or concerns. This doesn't just mean cross-site scripting vulnerabilities on the web interface, nor just the sql-injection attacks on the back end... it also includes any time you connect two code bases together in new and novel ways. The very nature of service-oriented architectures and modular code bases exponentially increases the number of things that can go wrong. Even a security-savvy developer that runs Coverity would never have enough time to test every possible permutation... nobody is willing to wait that long for the test cycle to complete, nor would anybody be willing to pay for it.

Thus, some problems will never be noticed until they are "in the wild."

You can yell and scream all you want... but this doesn't change the basic math. Again, don't just listen to me... check out security guru Bruce Schneier and his essay on why insecure products will always win in the marketplace. Its basic economics, called the "market for lemons," which I've covered before.

James seems to need some kind of evidence that the code is at least reasonably safe before putting it into production. Fair enough, but his suggestions suck. I can't think of one single certification that I would be personally willing to trust... penetration tests are OK but flawed. Developer certification courses only teach the basics, and are generally useless. Stamps of approval by "security experts" are nice, but as I've mentioned before, I've found problems that these self proclaimed "experts" missed.

In short, all of James's proposed solutions are false senses of security. Rely on them at your peril. If he's got a new one I've missed, I'm all ears.

You will always need to patch your applications. Accept it. You will never have a "100% secure" system. Accept it. The best you can hope for is something that gets more and more "defensible" as it matures. Accept it. Patches are a necessary evil. Accept it.

Instead of fighting the security battle -- which you will never win -- pick a battle that will both make your life easier, and have better security as a byproduct. Demand that your vendor:

  1. minimizes the number of required security patches, either through bundling or by educating their developers about security,
  2. thoroughly tests those patches to minimize the side-effects, and
  3. has excellent tools to help deploy, test, and roll-back patches if needed

That's probably the best you can hope for...

UPDATE: James responds, and I continue the dialog.

Take The Oracle Security Survey

Do you use Stellent, or any Oracle technology? Then you should probably take the IOUG Oracle Security Survey:

http://survey.ioug.org/

Select the OSSA Security Survey, and let 'er rip! It's sponsored by Oracle and the Independent Oracle Users Group. The goal is to gather information about your security practices including general processes for vulnerability and patch management, Critical Patch Updates, and the like. IOUG will analyze the results, and issue recommendations to Oracle at Oracle's next Security Customer Advisory Council. IOUG has release a security podcast to explain more about the survey.

I was shocked to discover that fewer than 20% of Oracle customers admit to applying the rolling security patches that Oracle releases... yikes. Back when I was a developer, I always found it extremely frustrating that customers rarely applied patches to known security holes... CERT often says that 99% of security breaches are due to users not applying patches. In other words, 80% of Oracle customers choose to make themselves vulnerable to 99% of the attacks.

WHY???

Unlike James McGovern, I don't believe security problems are entirely due to bad software or clueless developers... I'd argue most security problems are due to improperly configured and improperly maintained software. However, I also believe that blaming the implementation team is a cop-out. Instead, developers need to realize that security is a process, not a product (hat tip Schneier).

Thus, the best thing a developer can do for security is focus on software that can effortlessly evolve to meet tomorrow's security challenges. If you want secure applications, first demand software that is effortless to patch and maintain. This includes software that can easily roll-back patches in case the security fix broke something important... Then fewer people would fear installing the patches, more would use the existing patches, and there would be significantly fewer breaches.

If software were easy to configure and maintain, then security would get better and better the longer you owned it... not to mention you'd have fewer bugs, and generally better software. Stable products are always more secure. Why? If the product is rock solid, with few bugs, then people are less risk-averse to applying critical patches. Better documentation helps as well, as do better patch tools...

With easy patching, easy maintainability, stable software, and a vigilant community, security is a natural by-product. Also, this helps security becomes less of a cost-center... easy patching and configuration is great for ROI, no matter what.

It Just Makes Sense©, so don't expect too many people to press for it any time soon...

Although relatively speaking, I'm pretty impressed with Oracle's patch technology. The new 11g database watches for errors, and can notify you about patches that might fix the problem. Likewise, the Content Management team has a pretty good patch process... unfortunately, it takes forever to get anything out to Metalink, so your best bet is to always contact support for the latest patches.

The Transportation Safety Administration (TSA) Has Started A Blog!

Its true... why not leave a comment?

http://www.tsa.gov/blog

Holy cow... you should see the rants and raves against the TSA's liquid ban! Pure. White. Hatred.

I don't know what their angle is... maybe they know that most of their processes are silly, and they'd like to prove that there's a strong desire for change. Maybe they'd like to lift the liquid ban, but still be able to cover their ass, as Bruce Schneier would say... or maybe they're looking to expand their "no-fly list".

I understand people's frustration with the TSA: being a frequent traveler, I'm often a victim to their Kafka-esque rules... but some of these comments are unbelievably hostile. Cut the TSA some slack, give some constructive criticism, and maybe they'll stand up for common sense security... which after the past seven years, would be a breath of fresh air.

(Hat Tip Al Kamen)

Recent comments