Amazon SimpleDB: Instant Scalability, But Performance?

Are you a startup who need a cheap database, with guaranteed backup and uptime? Well, you'll love Amazon latest software-as-a-service project called SimpleDB. This is a bigger and better version of their popular S3 service... but instead of just storing and retrieving files, you can also QUERY your data storage.

What's great about it, is that you only pay for what you use. Upload a terabyte into the system, transfer a terabyte out, and it will cost you around $1800 per month. If you are a startup who needs Amazon quality backups, recovery, uptime, and a terabyte of database storage, I highly doubt there's a cheaper option... except for perhaps placing the less structured data into S3... if all you need is a hashtable lookup to find a file, or a message queue, SimpleDB might be overkill.

A lot of folks are really excited about this... some have speculated that it was written with Erlang... others claim its based on Java like Amazon's Dynamo database, which I've covered before...

The big question is... what exactly is it??? Its too simple to be a "real" database, and since everything is stored at Amazon.com, performance tuning will be tricky to say the least. Marcelo Calbucci has offered that SimpleDB is actually a directory service, and NOT a database.

I think a really killer way to do this is to use S3 and SimpleDB as a persistence layer for Oracle Coherence. That combo would let you keep a local cache of your Amazon database... so access would be a lot faster. Also, a little known fact is that Coherence can store Java code as well as data. This means invoking Java code in Coherence will enable executing the code in the same process-space as your raw data... which means incredibly fast performance.

Coherence on top of S3/SimpleDB is an application cloud on top of a data cloud. Coherence for speed and code execution, Amazon for cheap and nearly infinite scalability. That sounds pretty damn cool to me...

Of course, such a solution is mainly for startups, small, and maybe mid sized companies... I doubt most enterprise customers would be kosher with storing data at Amazon. Firstly, enterprise customers have already spent a lot on their data centers, and have made them pretty cost effective... even if their centers are more pricey than Amazon, its a big political hot potato.

Secondly, there's the security issue. Who wants to store their data at Amazon? Of course, if you could do an encryption step in Coherence before storing data to Amazon, then some of those issues might be moot. Of course, substring queries on encrypted data is pretty impossible, so you need to be careful.

Anyway, cool stuff. I hope someday to get to use it ;-)

comments

Simple DB is not a data base

Hi Bex

"The big question is... what exactly is it??? Its too simple to be a "real" database, and since everything is stored at Amazon.com, performance tuning will be tricky to say the least. Marcelo Calbucci has offered that SimpleDB is actually a directory service, and NOT a database."

IMO It is clear that Simple DB is not data base and it shouldn't' be measured as such - see my post on that regard: Simple DB is not a data base see snippet from that post below:

SimpleDB seems to address a need that I have seen referred to as Document-Driven Databases, in which records aren’t grouped by their structure but by their attributes. ORM tools, such as Hibernate or the Active Record pattern,
attempt to address this requirement by hiding the underlying relational model. They, however, still inherit the complexity and limitation of the underlying relational model.

Having said that, SimpeleDB is clearly not a solution for every scenario. In fact, it solves only a limited set of scenarios, such as the one described above.  As a disruptive technology, I expect that it will take some time before there is enough experience and patterns to use it correctly in the architecture.

The introduction of SimpleDB occurred mainly due to the limitations of existing database implementations, and how well they fit (or rather, don't fit) with the cloud computing model. There are other approaches that can be used to address these limitations, some of which I covered in my recent posts PaaS – Persistence as a Service (using Hibernate) (which discusses how you can address such requirements while keeping the data in the existing database) and The Missing Piece in Cloud Computing: Middleware Virtualiztion (which provides a broader context on the need for virtualization of the entire middleware stack, not just the data store, to make better use of cloud computing).

There are other solutions, some of which are in the making as we speak, such as the integration of Lucene/Compass and GigaSpaces. I'm sure that there are other solutions that aim to solve this challenge that I'm not aware of, so my recommendation is simple: before going  down the SimpleDB path, take a good look at your application requirements, and make sure that it is the right solution for your problems.

Nati S.
GigaSpaces

coherence could be middleware virtualization

Nati, have you seen Oracle Coherence? I believe it's similar to the GigaSpaces product you alluded to.

middleware virtualization

Hi Bex

I'm familiar with Coherence, your right that it can serve as a front end to SimpleDB but you should note that simpleDB support only String based attributes so i'm not sure how well this is going to work.
By virtual middleware i'm referring to the entire middleware stack i.e. messaging and data as well as the ability to virtualize existing API (JDBC, JMS,...) on top of it.
BTW were seeing more cases in which GigaSpaces and Coherence are used together - mostly the use of our Spring based SLA driven container in conjunction to Coherence as the caching layer so this might be an interesting angle to explore.

Nati S.

A Conference on SimpleDB

Saltmarch Media is organizing Great Indian Developer Summit event in Bangalore. This Summit will be a boost for the Software Developing Industries. It covers the topics like .Net, Java, SimpleDB, MongoDB, NoSQL and Richweb and has 1 day workshop at the end as well. Any one attending this event?

Register at developersummit dot com

SDB Explorer - Upload Bulk data to Amazon SimpleDB

SDB Explorer 2011.05.01.02 version has come up with bulk upload feature. Now user can easily upload large number of data to Amazon SimpleDB in quick steps, at good speed and using easy interface. You can upload your MY SQL data, can directly edit cells or can import data from CSV file to upload a bulk data to Amazon SimpleDB. Uploading bulk data get started in queue, so that user can view his/her progress of uploading. SDB Explorer provides you better visualization and statistics of your uploading data. SDB Explorer allows you to generate item names automatically for uploading data in bulk.

Recent comments