More On REST

After a glib commenter suggested I do some more reading about REST before I bash it so much, I took his advice. I read, and read, and read... Did my opinion change?

Not so much.

A lot of the debate in the REST/SOAP camp seems a little odd to me... the REST folks complain about SOAP being complex and weird... but that's not SOAP's fault. That's the fault of Microsoft and the W3C for trying to complicate matters beyond necessity. If you don't like it, don't use it.

Then there are the Talmudic debates about using POST versus PUT, which HTTP error code to use, or even encouraging Remote Procedure Calls (RPC)... those are mostly silly.

So, I'm going to add three more big reasons why REST gives me the willies:


Resources Versus Services

A web API should be somewhat analogous to a graphical interface: event driven. I click a button, and it does a thing. What does it do? I don't care... as long as its exactly what I want, I'm happy.

Users are so fun!

The problem is that step 1 of a REST API means you have to find the resources you wish to modify, and then call the service. What if I don't know what resource I want to modify? What if I don't care if the resource is a uk-customer or a european-prospect, I just want to record some data about a dude I met in London?

Remote APIs should focus on the action, not the back-end implementation. Otherwise you'll be tempted to use RPC, which is the exact opposite of good SOA.

If you force the user of the remote API to know what resource it is, then you lose some flexibility. What if you want the back-end system to decide what the resource is? Maybe it should be different based on the context of the request, and the user's credentials? What if you upgrade your system, and change the back-end implementation to use different resource names?

To avoid problems, you must force yourself to push more and more abstract resource names to the front end... which means you lose a lot of the value of named resources. The ultimate endpoint is when you have one resource named resource that takes an action parameter... at which point you're pretty much just doing SOAP.


Implied Resource Hierarchy/Taxonomy

If you have a small system, using named resources may work just fine. And in that case, REST makes it easy to know what services work for what resources. I myself am a fan of human-readable documentation, but to each his own.

However... this sort of implied hierarchy also gives me the willies.

The web is most definitely not hierarchical. Every attempt I've seen to create a taxonomy for it has failed. Even single taxonomies of intranets are unworkable. Look at Amazon.com for example. Every book is in multiple taxonomies, to aid in how people choose to browse for their options.

Extending this to REST, what if you want one service to update 10 resources simultaneously? What if the name and type of these resources changes on the fly? What if you're only allowed to choose 8 of the resources, the other 2 determined for you by the back-end system? What is the clear an unambiguous REST URL for that?

If its service-oriented, you'd call a service named FunkyActionOnTheseResources, then pass in ten (or eight) resource IDs as XML nodes. Done and done.

No taxonomy, no worries. But of course, you'll need to document them better... but that's always helpful.


HTTP Is Kind Of Broken

The REST people ask one interesting question: why does SOAP tunnel through HTTP, when HTTP is fine as it is? Well, because HTTP is not fine as it is. It is broken in lots of little ways which may seem minor, but can cause huge problems. You do lose something with tunneling, such as caching in browsers or reverse-proxies... but you gain control. And obviously, there is more than one way to implement a cache.

My personal big gripe is that there is a HTTP header for the encoding of the content, but not the encoding of the header! If I have a Japanese user name in the HTTP header, how do I know if its encoded with utf8, utf16 or shift_jis? I don't! You literally have to cross your fingers and pray. When it comes to logins, being helpful is a security hole.

And what happens if your browser gets a 404 error message? Does it display the actual response from the web site, or one of those 'friendly error pages' from Microsoft? What if your intranet has a helpful policy that sends back an internal error page in the event of a 404? And who knows what fun will be had in the future...

I'm sorry, you just can't trust HTTP to do what you want. Even if its in the spec, that doesn't mean its used by the applications you rely on.

For maximum compatibility in the most number of environments, some data must be placed outside of HTTP headers, period. If you build a system that relies on people always following the spec, you will soon know misery.


Not The End

There are good ideas about simplicity in the REST camp... I feel that those ideas should be brought into SOAP, instead of abandoning it.

We should abandon WSDLs, data binding, and strong typing in SOAP. The only reason to have those is if you are trying to tunnel RPC through SOAP... RPC is already a bad idea, don't make it worse by forcing people to use XML to describe a binary data object!

Allowing SOAP responses through GET is a good idea, and its one Stellent implemented 3 years ago. It wasn't in the spec, but it Just Made Sense®, so we did it.

Who knows what will happen to SOAP... but if you have technology that can adapt faster than everybody else, you will win.

comments

I read with interest your

I read with interest your first post about REST v. SOAP. I've actually got a partial response mouldering in a browser window @ work...

2 thoughts:
1) REST probably isn't right for all problems or needs. I would say that it is undeniably effective in the read-mostly environment of the web. I suspect that REST is also a very good way to expose an API to a fairly simple set of resources. REST can be addressed fairly easily by an extremely wide range of tools/languages employed by developers of varying skill. In such an environment, the documentation would be fairly easy to create and human-friendly URLs -could- guide developers to the data they want.
You might want to look over the APIs created for the 37Signals apps: Basecamp, Backpack, etc. I took a peek at those APIs just now and I can see that they're certainly not purist (high-?) REST. Still I like most of what I see.

2) I'm a bit puzzled by a couple of your objections - both here and in the original post. You say:
And what happens if your browser gets a 404 error message?
I'm not sure why you think the browser will be the application receiving the error. The client might be an AJAX app running in the browser but it would need to handle the HTTP responses properly in Javascript. Otherwise, the client could be just about anything and would need to handle the errors (or not) as the developer saw fit. The calling code might just be a script used to extract or load data on a one-time basis.

I'm not sure that REST would work well for the types of systems you seem to be imagining in your examples. I've seen concerns about transactions (like your update multiple objects example) and other complex processes that REST might not support well. I think that a REST API would need to be considered carefully or it could end up more inscrutable than RESTafarians claim WS-* can be.

It is an interesting exercise to analyze a system and try to imagine implementing it in a REST-ful way. If I were building a web-based system and wanted it to be open for easy extension by a wide audience of users and developers, I would sure try to make it REST-ful.

Please do keep thinking and writing about these issues - your posts have been informative and a useful addition to the debate.

kcmarshall

more good questions...

Thanks for the comment... regarding your points:

REST probably isn't right for all problems or needs.

True... But the problem/need I'm talking about is a remote-text-and-HTTP-based-API. I feel that a little bit of REST philosophy injected into SOAP would make SOAP the 90-95% solution for all such problems. I like REST-ful APIs for their simplicity, but I think they are overly simple, and only get 60-80% of the problem space.

That's fine for small or medium sized sites... blogs, forums, or pre-packaged open source apps like Drupal... but with the push towards SOA throughout the enterprise, I just don't think I could recommend it for an infrastructure API.

I'm open to the possibility that I'm an ignorant fool, but I'd need a bit more data to concede that ;)

Regarding the next question about 404s and error messages:

I'm not sure why you think the browser will be the application receiving the [404] error

It doesn't have to be, but it might be. Here's my reasoning:

  1. Some REST purists insist on using HTTP status codes (like 404) to signify an error.
  2. There are very few HTTP status codes, compared to the possible number of things that can go wrong in the world.
  3. In many situations, if your HTTP status code is not 200, some other application (browser, intrusion detection system, single sign on app) may hijack or alter the contents of the response.
  4. Therefore, its difficult to pass back a recoverable error code with a meaningful error message, unless the status code is always 200.

For example, if the response is a HTTP 500 error message, should I try again? Should I wait 20 seconds and try again? Should I fail? If I fail, what message should I log? What message should I show to the user? Are they the same or different?

I'm not talking about distributed transactions or anything that weird... I'm simply talking about being able to always send back meaningful error messages.

Which, IMHO, is one of the most important things a remote API should get right.

Of course, you could do what the Basecamp API does, and tunnel the actual error through the HTTP status message... but that's a lot more magic than a simple SOAP-ENV:Fault node, with a faultcode, faultstring and detail. SOAP is much more extensible...

I think you are wrong on many points

HTTP spec is well defined, and well supported by all. HTML is lacking, in that you can't specify a PUT or DELETE method.

Headers should be UTF8+URI Encoded
http://www.w3.org/International/O-URL-code.html

That will start to make sense if you look at how cookies are encoded and other data that may contain wide bytes.

Caching and Reverse Proxying have no effect on a ReST service.

You don't name your resources in ReST, you assign them a URL (Universal Resource Locator). The URL should make sense, and the path should be related to what the item fundamentally is, not it's attributes. A book could be fiction or non-fiction, but it can't be a movie. A book is physical media. A ReSTful URL for an amazon book might be amazon.com/product/medium/physical/book/[ISBN]. You use previous existing standards to identify the object (the ISBN number), and all the elements leading to the item tell you what it is, not it's attributes. The actual data returned might contain a multi-valued tag about whether it is fiction, non-fiction, science fiction, drama, romance, etc.

There is no way that simplicity can be brought into SOAP. It would cease to be SOAP. SOAP is a protocol developed by writting an XSD for a generic schema, then WSDL to describe the true format of the data. WSDL could be bypassed altogether by a simple XSD for the actual data type, thus reducing the complexity to that only of your data and XML. The only percieved gains from this extra layer is something to through tools at, because it gets in the way. You have to define a data format for your specific data, and there is no way around that. Then on the client end, you are going to have to understand that data format, and there is no way around that. You can't magically connect two web services without understanding each. That extra layer is just unnecissary complication, and it is the fundamental building block of SOAP.

Secondly, ReST promotes less RPC than SOAP. SOAP is RPC. You call methods on objects, and get data as a return. ReST is completely data centric, and thus has no association with RPC. The transport protocol of choice is HTTP, and HTTP could be seen as an RPC protocol that allows you to call 1 of 4 methods and recieve/send data in any format, not just some rigid, redundently defined XML format.

The error codes in HTTP are plenty enough. You usually only need a few of them (20x, 30x, 40x, 50x) and they allow you to return a body describing the error as much as you want. You can return a 404 and a body. The HTTP status code is only a flag on the request, much like error codes on console programs. In DOS and Unix, you return 0 for no error (200 in HTTP) and anything else for your own user-defined error code. 50x is the custom application error, if I'm not mistaken, where 20x, 30x, and 40x are protocol errors. So, if you want to mimic the SOAP error structure, you would set an error code 500, then return an xml (or other format of your choosing) document with the faultcode, faultstring and detail.

The web has been around and working a lot longer than SOAP for communication, and it is a lot simpler to use. It is the solution to web services, not a solution.

more fodder...

The web has been around and working a lot longer than SOAP for communication, and it is a lot simpler to use. It is the solution to web services, not a solution.

Looks like I landed myself an member of the ReST inquisition...

Firstly, statements like that are almost universally wrong. Its a common attitude taken by skilled developers: if it isn't my solution, then its not the "real" solution. If its not my way, then its wrong. Reality check: the web is a dynamic system, and has grown far beyond what people intended. I'm sorry, you cannot control its evolution. Nor can you predict with 100% accuracy that everything will be better if we implement technology X. Somebody is going to disagree, or use older techniques, and you're going to have to deal.

Reverse proxies and caches certainly do have an effect on ReST... unless you're of the opinion that a harmless ReST GET service on a highly-used, rarely changing resource should never be cached... And if that rule is a piece of ReST lore, then it's utter nonsense, and I have one more thing to complain about. I'm betting, however, that we're talking about different things...

Headers SHOULD be UTF8/URI encoded... but not everybody follows that rule. I've had to deal with headers in shift_jis in Japan. Plus, all it takes is one poorly configured app in the data flow, and you're screwed. Until everybody plays ball, its safer to keep things outside the header. Single sign on helps tremendously, but that's another story.

Regarding SOAP complexity... XML-RPC came first, but was too RPC-oriented, so people created SOAP: the service oriented approach. The WSDLs, Schemas, data binding, XSDs, and the WS-* stack came after SOAP. These technologies were a concerted effort to make SOAP overly complex so people wouldn't use it. Microsoft themselves admitted to O'Reilly the goal was to make SOAP so complex, that only tools would be able to do it properly:

http://www.regdeveloper.co.uk/2006/04/29/oreilly_amazon/

Simplifying SOAP would not make it cease to be SOAP; it would be kicking-it old school... The "easy ReST with Java" articles these days look suspiciously like eight-year old "easy SOAP with Java" articles...

Regarding error codes, you have a point, and I did previously mention that your way is how Basecamp does it. However, its not in the ReST "spec", and thus everybody will do it differently. IMHO, it needs to always be a 500 with a standard structure (faultcode and faultstring), so you can have recoverable errors that won't be hijacked by another system in the enterprise. Until that's an official part of the ReST "spec", SOAP wins.

And finally, you claim that a ReST URL shouldn't contain a name, rather a unique identifier about what an item is... good idea. If only every web service had one unique identifier. What if you want one service to act on multiple resources? What if you want to choose the IDs for the resources on-the-fly? What if the resources need to be chosen randomly, like personalized suggested links for the user? Got a clear and unambiguous ReST URL for that?

SOAP lets people be more sloppy... to get the full value out of ReST a lot of people have to get along, and organize their data first. Great for "green field" apps, but legacy systems? I don't know if I'm being a cynic or a realist, but I just don't think that's going to happen.

Do you have an example where SOAP is a better tool than REST?

You seem to have a scenario in mind when you say things like:


What if you want one service to act on multiple resources?

Could you flesh out this out a bit more and show why SOAP is a better solution than REST?

ReST and SOAP good for different things

Here's the way I see it:

  1. Everything about ReST -- including best practices -- screams resource-oriented architecture (ROA)
  2. An ROA is an excellent API when you're dealing with simple resources, well-defined resources, or resources that require a great deal of knowledge on behalf of the developer. Examples: update-emea-customer-data, or update-european-prospect-data. You need two, because existing customers are stored in a different repository than leads and prospects. And don't gimme the "you shouldn't do that" argument, because everybody does crap like that.
  3. A service-oriented architecture is best for unknown resources, or not well defined systems... for example, where the logic needs to select the resource AND action AND repository based on state, and the incoming data. Example: update-info-about-some-random-dude-I-met-in-London-you-figure-it-out
  4. A ROA also is less than awesome when you are an Agile Programmer, and you need to refactor -- and deprecate -- your APIs. An SOA requires fewer API shifts when you alter the back-end implementation
  5. It is possible to create a SOA using REST, but doing so is awkward. Plus it would violate many of the best practices laid out by zealots. Example: its a bad idea to pass "action" or "service" as a parameter. Such things make ReST feel funny, but its natural in SOAP.
  6. Thus, I personally believe the best network API is in two layers: a SOAP based SOA, which in turn makes a series of ReSTful calls. Example: even if the input data is a chaotic jumble, I can extract relevant data, determine which resource (or resources) to add (or update or delete).

That way, your developers could create as anal of a ReST API to their app as they want... but you can still have a SOA-based API to turn raw data from emails into a well-formed update of multiple resources in multiple repositories with one single call.

Recent comments