13 Rules For High-Speed HTML

The author of High Performance Web Sites just put together a list of 13 simples rules for high-speed HTML on Yahoo. Briefly, the tips are:

  1. Make Fewer HTTP Requests
  2. Use a Content Delivery Network
  3. Add an Expires Header
  4. Gzip Components
  5. Put CSS at the Top
  6. Move Scripts to the Bottom
  7. Avoid CSS Expressions
  8. Make JavaScript and CSS External
  9. Reduce DNS Lookups
  10. Minify JavaScript
  11. Avoid Redirects
  12. Remove Duplicate Scripts
  13. Configure ETags

Some of these are obvious... like using GZIP, reducing DNS and HTTP requests, and leveraging the Expires HTTP header. Some are really innovative, such as CSS Sprites -- which sound really cool but difficult to maintain.

Others seem like bad ideas to be, like using the data: URL scheme to embed base64 encoded images directly in the HTML. I'd only do that as a last resort to squeeze out every last drop of performance... its much cheaper and easier to buy new hardware or become an Akamai client...

In all, its a good checklist to run through on every page you have, in order to make it load as quickly as possible.

Freakonomics

Interesting material, well researched , but very shallow.

The second edition of this book contains the original article from the New Yorker by Dubner about Levitt. Save your time: read the article online instead of this book. It's 5% the size, yet contains 80% of the same material.

There is a bit more info about Sumo wrestlers throwing games... and a good overview of cheating teachers. The book also contains info -- of questionable validity -- about Stetson Kennedy and the KKK.

However, what's missing is a good grounding of regression analysis, or an in-depth analysis of any of the subjects. Cheating, crime, incentives, information asymmetry, any of these would make a great book on their own... but the ADD-style of this book always left me feeling that something big was missing, and thus I couldn't trust that all arguments were presented.

The section on information asymmetry was so shallow, thet they didn't even mention The Market for Lemons by Akerlof. The coverage of cheating real-estate agents was so shallow, they didn't even cover that their book may create a self-defeating prophesy. Many sellers I know use the threat of firing the agent, and thus create the negative incentive of zero payment to a lazy realtor.

I was also shocked that nowhere in the book did he cover statistical significance or margin of error... He runs a few numbers, spits out a percentage, and we're expected to swoon. So what if his data says that realtors sell their own homes for 2% more than their client's homes? What's the frigging margin of error?

Throughout the book the authors joke about there not being an overriding theme to the book. Quite true: it did ramble on about disjointed things and left out a great deal of detail... perhaps that's a bad thing, and not something to laugh about.

This could have been a much better book... but it wasn't.

Friday Bizarre: My Hands Are Bananas

From the Euro-dork department:

My Hands Are Bananas

Yes... beware the milky pirate...

Glitch In The Matrix...

Drupal hit a minor snag... the permissions data got a bit wacked on my blog yesterday, and anonymous users were getting an access denied message. So I re-set them, and things should be back to normal now. That must have happened during the last automated database backup...

Thanks to Jason Stortz, Derrick Shields, and Billy Cripe for alerting me.

The Power Of Enterprise Mashups

I believe that mashups are the most powerful and underrated piece of the Web 2.0 puzzle. Mashups are lightweight applications, made mostly with JavaScript, that combine data from multiple sources to create innovative applications.

Blogs and wikis get coverage because they turn everybody into a web contributor. RSS feeds get coverage because they turn everyone into a massive consumer of web content. But mashups? Not as much coverage.

Why not? Don't people understand that mashups allow everyone to become enterprise level web programmers?!?

Allow me to elaborate...

Mashup Editors

WebWare recently did a good overview of the three available (sort of) Mashup editors.

After Yahoo released Pipes, which allowed people with little programming experience create web mashup applications, Microsoft and Google had to follow suit... with Popfly and Google Mashup Editor respectively.

Neither are ready for prime time, as of yet. You need an invite, and they are highly stingy with them. However, early reviews lean towards Popfly... which I believe may be the first killer app built with Silverlight.

I personally believe that enterprise mashups will be huge business in the near term... not just for cool web widgets, but also for enterprise apps. Naturally, then need to be backed with SOA and some kind of distributed single sign on system, but those are good ideas regardless if enterprise mashups take the world by storm...

XPath Injection, And JSR170

In case you use XPath as a query language to your repository (instead of SQL or something else), hopefully you are aware of a little problem called XPath Injection Attacks.

Anybody who knows web apps and security knows of the dangers of SQL injection attacks... many web apps are vulnerable to this. If you have a web form, and generate a SQL query with the data on that form -- without validating the data -- then you're open to attacks. People can inject whatever SQL they want into the web form, and trick your application into running their SQL instead of your SQL. This could cause data leaks, or even data deletion.

You can fix this problem simply by escaping all quotes in the data from the web form... as well as type checking dates and numbers. You may also need to count the parenthesis and remove SQL comments, depending on your application...

XPath injection works in an analogous way to SQL injection... the only different is that the injected attack has a different syntax. If your repository allows XPath query syntax, you'll need to do a lot more data validation to protect yourself...

Now, XPath is typically used to query single XML files. Very few people used it as a full query language to a database or content repository. In my humble opinion, that's for a damn good reason: XPath syntax is awkward and weird. Its totally a step backwards in both usability and performance... however, because of trendy new XML standards, XPath injection may be a bigger problem then you realize. You might be using XPath all over, or your repositories might allow XPath query syntax even if you don't use them.

Case in point, I'd highly recommend that anybody who uses that rotten JSR 170 protocol for content management PLEASE look long and hard at how secure your system actually is... You know who you are... I'd start by reading the XPath Injection Attacks article from IBM.

My New Title: Oracle ACE Director

About a month ago, I was awarded the title Oracle Fusion Middleware Regional Director. And I know what you're thinking... what the heck is an Oracle Fusion Middleware Regional Director? Let me explain:

  • I don't work for Oracle,
  • I'm not directing anybody, and
  • I don't have a region.

Clear?

Apparently, this name was confusing to many people, so they decided to merge it with the Oracle ACE Program, and promoted me to an Oracle ACE Director... which sounds even cooler. Although my profile looks a bit funny now...

Seriously... I'm kind of excited about this. Partially because I got the title, but also because Oracle thinks that people who do what I do deserve official recognition. Oracle started this program for people who are something of a developer's advocate: somebody who helps out the Oracle community with tips, tricks, articles, or by working closely with local user groups. There's about 40 worldwide, probably growing to 60 eventually.

I also get the chance to chat with people on the product team to hopefully steer product direction based on developer need. So that means whenever I go out drinking with Alec, Andy, or my wife, its a business expense. Ha!

Anyway... Oracle is -- like Stellent was -- primarily a software company. Sure, they do consulting and training, but that's not their main focus. Therefore, a strong developer community is essential for the success of their business. Developers hate paying for consulting and training... so a strong community always means giving away great information for free. That's the only way to convince excellent developers to love the product. Trust me. When it comes to paying for software or training, the smarter the developer, the stingier the developer.

Oracle decided that it made lots of sense to bring me on board, since I already do the kinds of things a director should do:

Done.

I sure hope I can also swing a free pass to Oracle Open World out of this... maybe even a T-shirt.

ECM Security Standards, Continued...

After my first two rants about ECM standards (ECM and SSO, and ECMs Store Content Not Users), I think we've established were James McGovern and I disagree.

The main disagreement is about SAML. I didn't see its value, and detailed Oracle/Stellent's architecture to explain why. James mostly agreed, except for one interesting use case:

If ECM vendors simply leveraged Active Directory not solely for authentication but also as a user store and mapped to it at runtime then the need for SAML disappears within most scenarios within the enterprise. It still ignores a potential scenario where your users aren't stored in any repository that the enterprise owns.

Bingo... the one situation where something like SAML comes in handy. Somebody has totally valid credentials to access the repository. However, the authentication and authorization of that user must be done by connecting to a server that is not owned by the enterprise. Stellent/Oracle can handle multiple user repositories, but typically only if its within the enterprise.

For example, assume the person trying to access your ECM system is a business partner, prospect, or customer... They already have passwords and credentials stored behind their organization's firewall, but if you can't access it, you need to duplicate all that info, and make them log in again. Until fairly recently, you were forced to do it this way: you could have SSO across an enterprise, but not easily between enterprises. Things like SXIP and SAML fix this, so you can have federated (or distributed) single sign on.

Imagine: one password to connect to the entire internet... The developers at Stellent knew a while back that something like this was the ultimate endpoint, but the question was which protocol was going to win out? SSL certificates are a management nightmare... Should we follow SAML/XACML because its a standard, or OpenID/SXIP because they are (fairly) open source, simple, and usable right now?

Which is better? Without a clear contender, or any any specific market demand, its very risky to take the lead... the safe bet is to be knowledgeable and reactive. If somebody asks for SAML, it's no problem to add it to Oracle. However, at present my money is against SAML/XACML for the long-term.

I've never deployed either enterprise wide, so I cannot speak about the maintenance problems... perhaps SAML is easy to maintain, but given its complexity, I'd find that surprising.

I'm also very nervous about SAML because it is endorsed by Microsoft, whose first attempt to solve this problem was the god-awful Microsoft Passport. Also, Microsoft has a long history of ruining open standards that threaten them. Active Directory is huge money, as is the enterprise search market, not to mention Sharepoint. I don't expect Microsoft to play nice for long...

Don't think so? Remember their proprietary Kerberos extensions? Or how about how they ruined SOAP with the ungodly complex WS-* stack? If Google tries to press harder into the ECM space -- and not just enterprise search -- then the other shoe will certainly drop, and decent SAML implementations without Active Directory may be impossible.

I sense danger...

And now I'm also nervous that SAML might be catching on in the ECM zeitgeist... one recent proposal included the terrible, rotten, just plain awful idea of integrating XACML, internet search, and ECM together. I challenge Guy Huntington to put his money where his mouth is, and implement something like that himself. I defy him to get his pet project to scale well or perform without millions in hardware for every ECM on the planet.

Adobe Flash/Flex/Apollo/AIR, coming to your town!

I was initially a bit bummed out by Adobe... in their announcement email to me it looked like I would miss the Adobe On Air bus tour. However, after checking their site, it appears they will be in Minneapolis on Sept 27th instead. Looks like I will after all get to hack rich internet apps (RIAs) with the experts.

Anyway, in honor of that, here are some other links I dredged up about Adobe Flash/Flex/AIR:

I first got interested in RIAs at an O'Reilly Emerging Technology conference 4 years back... I had three initial reservations about writing apps in Flash: stability, performance, and accessibility for the visually impaired. I brow beat one of the main developers about it in a highly attended general session, and later felt that I was perhaps a tad harsh... Some people do that. Anyway, thus far they have really worked on the performance and stability problems. I'm sad to see they've made little headway into the accessibility problems...

I'm sure they feel that blind people probably don't need flashy interfaces, and would instead greatly benefit from a non-Flash based interface... like what I talked about in The Future Of Accessibility. However, this misses an important point:

If blind people cannot access your Flash content, then neither can Google! All that great data embedded in your flashy interface is locked up... either in JavaScript, ActionScript, or whatever... and Google can't spider it with the GoogleBot.

Microsoft's Silverlight claims to be Googleable... plus there are other technologies (like Faust from the Minneapolis based Flash hackers at Space150) that work-around this gap in Flash... But Adobe really needs to get on the ball about this one if they really expect to take on both AJAX and Microsoft.

Business Networking Tip: Eat Me

From the only in Japan department...

A new product, named Taberu Me, is taking business cards to an extreme. Instead of printing on something as mundane as paper, this one opts for printing on peanuts. Or beans. Or cashews. Why be ordinary when you can go for the gusto and be incredibly weird?

Some analysis from Treehugger

Pink Tentacle says "Taberu means “eat” and Me could either be an abbreviation of meishi (”business card”) or “me” in English, in which case Taberu Me would be saying “Eat me” — a message you probably don’t want to convey to your new business partner at the first meeting.

Innovative idea, and it will certainly make you memorable... but in Japan -- where trading business cards borders upon a sacred ritual -- I don't see this catching on.

Seriously, that name is like Bite The Wax Tadpole weird...

Congress Needs A Version Control System...

Karl Fogel from the Subversion project had an interesting idea about government... perhaps its about time that we mandate our laws to be created inside a version control system. This could be an ECM system, or a source control system like Subversion, or something specific to government. In any case, it would help us track who made which change to which law, and when.

What I love most about Subversion is the blame feature... when something crazy goes awry, you bring up the source code and run blame. This will show you which user (or lawmaker) is responsible for which changes to the code (or law). If somebody snuck-in some untested code (or a $100 million kickback to a lobbyist), they won't be able to hide it too well...

In the image screenshot of blame in action, we see that there are four revisions of the file (8, 12, 13, 14), and the user "padma" is responsible for every change. This is similar to Microsoft's Track Changes feature... but since laws are high in content and low in formatting and photos, a plain-text specific tool might be a better match.

Of course, them Congressmen are tricksy... you'll need specific laws that only a Congressman gets access to the system to create a bill, and must have daily commits, and all amendments must be branches off of a bill to be merged in when voted on. I believe it will also be essential to have these be accessible over the web, so everybody can see not only who made what change, but what changes are being considered.

It would also be good to mandate a minimum time delay between when it was written, and when it can be voted on... not to mention syndication feeds and subscriptions so watchdog groups can instantly monitor proposed changes and proposed bills.

Heck, if Congress is making Wall Street follow Sarbanes Oxley, then its only fair to have some level of accountability in Washington as well...

Windows Live Local: Now With Flair!

Perhaps inspired by the Google Maps Flight Simulator, the latest app from Microsoft -- Windows Live Local -- lets you navigate through a street map in a simulated sports car.

Its unbelievably clunky and awful at the moment... but perhaps soon you can race through the real streets of San Francisco like Steve McQueen. Or perhaps it will always be clunky and awful.

I'm rather curious why they didn't do this in Flash... perhaps because they couldn't get clearance because Microsoft makes a competing product that nobody uses yet: Silverlight. However, that begs the question, why didn't they use Silverlight? And where are the dang mashup APIs, dudes?

Oh well... another lackluster offering from Microsoft.

Anybody Want A Pownce Invite?

I just got my invitation to Pownce... no clue from who or why, but I set up my Pownce profile anyway. They don't allow 3-letter user names, so I used my old standby bexmex.

It seems interesting, kind of a fusion between Twitter, file sharing, and Google Calendar. Yet another shiny object on the social network technology heap...

Anybody want an invite? Leave a comment. Some lame-o people put theirs up on Ebay, and I'm glad to see them not selling well.

How Much Is Your Life Worth?

If you had to put a dollar value on your life, what would it be?

How much are you worth?

Before doing that, you also need to take the Am I Dumb? test as well:

How smart are you?

Completely arbitrary cross-promotional nonsense... but still a fun way to waste a few minutes. ;-)

101 Ways To Know Your Project Is Doomed

Code Squeeze had a pretty funny post about 101 ways to know your project is doomed... some of my favorites:

  • All of your requirements are written on a used cocktail napkin.
  • Developers still use Notepad as an IDE.
  • You bring beer to the office during your 2nd shift.
  • Your manager substitutes professional consultant advice for a Magic 8 Ball.
  • The boss does not find the humor in Dilbert.

An ECM Should Store Content, Not Users

OK, I think I've figured out the disconnect between me and James McGovern regarding SAML... When he asked if Oracle's ECM supported SAML, I was about as puzzled as if he had asked if it supported client connections via JDBC. Well... I suppose you could make that happen, but why not just connect directly to the database? It just made no sense...

Here's why: James has apparently never used Oracle's ECM solution, and is commenting on the poor architecture of other enterprise applications. I believe if he took a peek at chapter 2 of my book, he'd recognize that SAML support is unnecessary in this case... (psst, bug Billy for a free one ;-)

Here's the deal... back in version 3 of the product (we're now at version 10), the dev team saw the emergence of LDAP and Active Directory. We knew it made no sense for an ECM product to be both a user repository and a content repository. That just made things overly complicated. Plus, we could never keep up with the feature requirements. Instead, we recommend integrations that "slave" the content server to an existing user repository.

Put your users in a user repository, put your content in a content repository. It just makes sense.

Here's how a basic request operates: first, the content server asks the external system to authenticate the user's password (or token), and also return a "blob" of info about him. Every user repository has a different API, but this "blob" usually contains group memberships and attributes. The next step is to map the user data to content server specific security groups and security accounts. This mapping can be done in many many ways, from zero configuration to a few dozen lines of custom Java (or C++). Again, depends on the system. Finally, the security check determines if this user is allowed to execute the specific service (like GET_FILE), with the specific document, based on the security groups of the document, the security level of the service, and the user's roles & accounts.

It can get a little more complex with ACLs, personalization, and workflows, but you get the picture.

This happens on the fly: no authorization data is replicated, its only cached for a few minutes for performance reasons. Thus, all user management is where it should be: in the user repository. The content server does a mapping to a content-specific security model, no more.

This is called an External user. People also set up Local users, which are just stored in the database. Local users are discouraged in production systems, thus they are typically only used for testing and superusers. A small handful of customers use exclusively Local users, but they typically don't need, have, or want an enterprise user repository... thus, the only people who could possibly benefit from a SAML interface to Oracle's ECM would never use it.

But what if the Active Directory domain controller is on the other side of the planet, and performance sucks? It appears that some ECM systems make the interesting choice of replicating the user repository... but we'd suggest instead using a product that is explicitly designed to replicate a user repository, and "slave" the content server to that... such as Active Directory Application Mode (ADAM). Some customers went so far as to create home-brewed LDAP spiders to cache data, and then integrate all their apps with the cache.

I feel that making every application on the planet support SAML is a silly duplication of effort... I think its better that applications allow for loose slave-like integrations with dedicated user repositories. Use the right tool for the right job.

Now... SXIP and OpenID? Those are genuinely interesting... I'd bet that people will be willing to pay for an integration with them before they'll pay for SAML. Plenty of clients use SalesForce.com, and might be interested in a cleaner integration between content and customers.

Hopefully this clears things up...

Meet REST: The #1 SOA Anti-Pattern!

I hate code patterns... they're a one-way street towards mediocre and uninspired software. But I love anti-patterns! Sites like Worse Than Failure love to chronicle the bad bad decisions that developers make, and the comments are a gold mine of how to do it right.

Anyway, I found an old article on MSDN about Service Oriented Architecture (SOA) Patterns and Anti-Patterns, linked to from a newer blog at IT Toolbox.

They started off by discussing the CRUD model for writing database applications -- short for Create, Read, Update, and Delete. This is used very often by REST fans, and it always gave me the willies. I'm happy to see that these guys dubbed it their #1 anti-pattern for good SOA design. It made many points that I argued long ago on my blog and in my book... but the article had it in concise list-format (emphasis mine):

  • The interface design encourages RPC-like behavior, calling Create, MoveNext, and so on, instead of sending a well-defined message that dictates the action to be taken. This is a violation of the first (Well Defined Boundaries) and third (Share only Schema) tenets.
  • Interface is likely to be overly chatty, since consumers may need to call two or three methods to accomplish their work.
  • Using a Sub for Create means that the consumer will have no idea if the operation succeeds or fails. When designing a service always keep the consumer's expectation in mind -- what does the consumer need to know?
  • CRUD operations are the wrong level of factoring for a Web service. CRUD operations may be implemented within or across services, but should not be exposed to consumers in such a fashion. This is an example of a service that allowed internal (private) capabilities to bleed into the service's public interface.
  • The interface implies stateful interactions such as enumeration (see the MoveNext and Current functions).
  • Abstract types (such as the Object returned by the Current function) result in a weak contract. This is another example of violating the third tenet (Share only Schema).
  • This is a very dangerous service since it could leave the underlying data in an inconsistent state. What would happen if a consumer added a new Contact (or updated an existing Contact) and never called the CommitChanges function? As stated earlier, service providers cannot trust consumers to "do the right thing."

yep... that pretty much sums it up...

I still prefer the model Stellent used for their SOA: a service should be a sequence of code to execute, or a CRUD operation to the database. Some code gets executed for all services -- like security or page rendering -- whereas other is limited to specific types of services, or specific services. The end user doesn't need to know anything about the back-end database schema, so its kind of silly to expose it.

I Should Make Myself Clearer...

I'm going to have to be more clear in my rants... my anti-ECM-standards rant is getting some people so hopped up they can't see straight. The latest is from Craig Randall:

Bex Huff left a comment on Mark’s post, which referenced his reply-via-post. Bex makes several good points, but at the same time what I perceive is that if an ECM standard isn’t reasonably or capably an end-all-be-all standard for the domain, then why bother. (Bex, if I misunderstood your post, please leave a comment to set me straight.)

huh... I actually said almost exactly the opposite.

In previous posts on my blog, I said that a end-all-be-all ECM standard is impossible. ECM is a marketing term, not a technical term, thus over 100 apps can claim to support "ECM", but can deliver whatever the hell they want. Good luck creating a standard interface to a marketing buzzword.

If you want some modest ECM standards, and a simple interface, fine. There are 4 such standard already, just pick a damn horse. Stellent/Oracle supports 3, and will probably support all 4 soon... just in time for the 5th to be finalized. Joy.

Not that it matters... nobody uses the standards that already exist, yet they keep asking for more. I understand why: every ECM standard is far far too simple to be useful. Why should somebody shell out thousands of dollars for an ECM system, and access it with a "standard API" that hides 90% of what they paid for? At the same time, an enterprise can have several ECM systems at once... and it would be nice if a middleware layer could have a single API to access them all. Nice, but not nice enough that they will willingly sacrifice important features...

I'm tired of wasting cycles on the pipe dream of a useful ECM standard, until the market changes enough for one to be feasible. That will happen after Microsoft fixes SharePoint, more consolidation happens in the market, and the vendors who merely claim to have ECM either shut up or go away. Like I said, probably not before 2009.

ECM people hate SSO? Not quite...

James McGovern kindly linked to my screed against REST, but I think he misunderstood me when I talked about SAML. No problem... if I read as many blogs per day as he does, I'd do the same.

His quote was this:

Bex Huff provides an interesting perspective on REST within the ECM domain. His comment: you could "punt" and rely on wacky SAML, but that just seems to complicate things beyond necessity... seems as if folks in the ECM domain don't believe in the notion of SSO and would rather force complexity in other ways such as making folks log into different systems of course using different passwords, making enterprise administrators duplicate identity stores instead of leveraging an existing one such as Active Directory and so on.

Now... everybody I know in the ECM space cares about Single Sign On (SSO). In fact, Stellent/Oracle supports Active Directory and LDAP out of the box, a few minor tweaks gets you SSL certificates, plus we've made dozens of customizations for Site Minder, and custom/exotic SSO system. I even made an ANT script that could build a custom security integration with just about anything with a few lines of C++.

Trust me, we all know and love SSO.

The problem I have is more specific to SAML. I just don't like it. In fact, I hate SAML. Nothing personal, I just start out hating all technology. I have to. Otherwise, I find it difficult to discover its flaws. If I don't know the flaws, I can't effectively recommend when to use it. There is no silver bullet, and after working with computers for 20 years I've learned to distrust almost everything.

So, I started out hating SAML four or five years ago, when I first heard of it. Guess what? Thus far I've encountered no reason whatsoever to reduce my dislike.

Most of the cool stuff in identity management seems to be with OpenID and SXIP. SAML has been around forever, and who is using it? Its not saying "here's some useful technology," its saying "here's how things should be done." It feels like something from the peaks of the XML ivory tower that makes the claim (yet again) that the entire world would magically be better if we took all information and put <angle brackets> around it... Where's the evidence? Where's the proof?

I get why people are hot about Active Directory, SXIP, and OpenID... I just don't believe SAML has proven it deserves any hype. It might make somebody's job easier, but at what cost? I'm totally open to the possibility that I'm wrong, or that SAML 2.0 is a million times better... but I'll believe that when I see it.

Recent comments