1.2. Why CouchDB?
Apache CouchDB is one of a new breed of database management systems.This topic explains why there’s a need for new systems as well as themotivations behind building CouchDB.
As CouchDB developers, we’re naturally very excited to be using CouchDB.In this topic we’ll share with you the reasons for our enthusiasm.We’ll show you how CouchDB’s schema-free document model is a better fitfor common applications, how the built-in query engine is a powerful wayto use and process your data, and how CouchDB’s design lends itselfto modularization and scalability.
1.2.1. Relax
If there’s one word to describe CouchDB, it is relax. It is the bylineto CouchDB’s official logo and when you start CouchDB, you see:
- Apache CouchDB has started. Time to relax.
Why is relaxation important? Developer productivity roughly doubled in thelast five years. The chief reason for the boost is more powerful tools thatare easier to use. Take Ruby on Rails as an example. It is an infinitelycomplex framework, but it’s easy to get started with. Rails is a successstory because of the core design focus on ease of use. This is one reason whyCouchDB is relaxing: learning CouchDB and understanding its core conceptsshould feel natural to most everybody who has been doing any work on the Web.And it is still pretty easy to explain to non-technical people.
Getting out of the way when creative people try to build specializedsolutions is in itself a core feature and one thing that CouchDB aims to getright. We found existing tools too cumbersome to work with during developmentor in production, and decided to focus on making CouchDB easy, even a pleasure,to use.
Another area of relaxation for CouchDB users is the production setting.If you have a live running application, CouchDB again goes out of its wayto avoid troubling you. Its internal architecture is fault-tolerant,and failures occur in a controlled environment and are dealt with gracefully.Single problems do not cascade through an entire server system but stayisolated in single requests.
CouchDB’s core concepts are simple (yet powerful) and well understood.Operations teams (if you have a team; otherwise, that’s you) do not have tofear random behavior and untraceable errors. If anything should go wrong,you can easily find out what the problem is, but these situations are rare.
CouchDB is also designed to handle varying traffic gracefully. For instance,if a website is experiencing a sudden spike in traffic, CouchDB will generallyabsorb a lot of concurrent requests without falling over. It may take a littlemore time for each request, but they all get answered. When the spike is over,CouchDB will work with regular speed again.
The third area of relaxation is growing and shrinking the underlying hardwareof your application. This is commonly referred to as scaling. CouchDB enforcesa set of limits on the programmer. On first look, CouchDB might seeminflexible, but some features are left out by design for the simple reasonthat if CouchDB supported them, it would allow a programmer to createapplications that couldn’t deal with scaling up or down.
Note
CouchDB doesn’t let you do things that would get you in trouble later on.This sometimes means you’ll have to unlearn best practices you might havepicked up in your current or past work.
1.2.2. A Different Way to Model Your Data
We believe that CouchDB will drastically change the way you builddocument-based applications. CouchDB combines an intuitive document storagemodel with a powerful query engine in a way that’s so simple you’ll probablybe tempted to ask, “Why has no one built something like this before?”
Django may be built for the Web, but CouchDB is built of the Web. I’venever seen software that so completely embraces the philosophies behindHTTP. CouchDB makes Django look old-school in the same way that Djangomakes ASP look outdated.—Jacob Kaplan-Moss, Django developer
CouchDB’s design borrows heavily from web architecture and the concepts ofresources, methods, and representations. It augments this with powerful waysto query, map, combine, and filter your data. Add fault tolerance, extremescalability, and incremental replication, and CouchDB defines a sweet spotfor document databases.
1.2.3. A Better Fit for Common Applications
We write software to improve our lives and the lives of others. Usually thisinvolves taking some mundane information such as contacts, invoices,or receipts and manipulating it using a computer application. CouchDB is agreat fit for common applications like this because it embraces the naturalidea of evolving, self-contained documents as the very core of its data model.
1.2.3.1. Self-Contained Data
An invoice contains all the pertinent information about a single transactionthe seller, the buyer, the date, and a list of the items or services sold.As shown in Figure 1. Self-contained documents, there’s no abstract reference on thispiece of paper that points to some other piece of paper with the seller’sname and address. Accountants appreciate the simplicity of having everythingin one place. And given the choice, programmers appreciate that, too.
Figure 1. Self-contained documents
Yet using references is exactly how we model our data in a relationaldatabase! Each invoice is stored in a table as a row that refers to otherrows in other tables one row for seller information, one for the buyer,one row for each item billed, and more rows still to describe the itemdetails, manufacturer details, and so on and so forth.
This isn’t meant as a detraction of the relational model, which is widelyapplicable and extremely useful for a number of reasons. Hopefully, though, itillustrates the point that sometimes your model may not “fit” your datain the way it occurs in the real world.
Let’s take a look at the humble contact database to illustrate a differentway of modeling data, one that more closely “fits” its real-world counterpart– a pile of business cards. Much like our invoice example, a business cardcontains all the important information, right there on the cardstock.We call this “self-contained” data, and it’s an important conceptin understanding document databases like CouchDB.
1.2.3.2. Syntax and Semantics
Most business cards contain roughly the same information – someone’s identity,an affiliation, and some contact information. While the exact form of thisinformation can vary between business cards, the general information beingconveyed remains the same, and we’re easily able to recognize it as abusiness card. In this sense, we can describe a business card as a real-worlddocument.
Jan’s business card might contain a phone number but no fax number,whereas J. Chris’s business card contains both a phone and a fax number. Jandoes not have to make his lack of a fax machine explicit by writing somethingas ridiculous as “Fax: None” on the business card. Instead, simply omittinga fax number implies that he doesn’t have one.
We can see that real-world documents of the same type, such as business cards,tend to be very similar in semantics – the sort of information they carry,but can vary hugely in syntax, or how that information is structured. As humanbeings, we’re naturally comfortable dealing with this kind of variation.
While a traditional relational database requires you to model your dataup front, CouchDB’s schema-free design unburdens you with a powerful way toaggregate your data after the fact, just like we do with real-worlddocuments. We’ll look in depth at how to design applications with thisunderlying storage paradigm.
1.2.4. Building Blocks for Larger Systems
CouchDB is a storage system useful on its own. You can build many applicationswith the tools CouchDB gives you. But CouchDB is designed with a bigger picturein mind. Its components can be used as building blocks that solve storageproblems in slightly different ways for larger and more complex systems.
Whether you need a system that’s crazy fast but isn’t too concerned withreliability (think logging), or one that guarantees storage in two or morephysically separated locations for reliability, but you’re willing to take aperformance hit, CouchDB lets you build these systems.
There are a multitude of knobs you could turn to make a system work better inone area, but you’ll affect another area when doing so. One example would bethe CAP theorem discussed in Eventual Consistency. To give you an idea ofother things that affect storage systems, seeFigure 2 and Figure 3.
By reducing latency for a given system (and that is true not only for storagesystems), you affect concurrency and throughput capabilities.
Figure 2. Throughput, latency, or concurrency
Figure 3. Scaling: read requests, write requests, or data
When you want to scale out, there are three distinct issues to deal with:scaling read requests, write requests, and data. Orthogonal to all three andto the items shown in Figure 2 and Figure 3 are many more attributes like reliability or simplicity.You can draw many of these graphs that show how different features or attributespull into different directions and thus shape the system they describe.
CouchDB is very flexible and gives you enough building blocks to create asystem shaped to suit your exact problem. That’s not saying that CouchDB canbe bent to solve any problem – CouchDB is no silver bullet – but in thearea of data storage, it can get you a long way.
1.2.5. CouchDB Replication
CouchDB replication is one of these building blocks. Its fundamental functionis to synchronize two or more CouchDB databases. This may sound simple,but the simplicity is key to allowing replication to solve a number ofproblems: reliably synchronize databases between multiple machines forredundant data storage; distribute data to a cluster of CouchDB instancesthat share a subset of the total number of requests that hit the cluster(load balancing); and distribute data between physically distant locations,such as one office in New York and another in Tokyo.
CouchDB replication uses the same REST API all clients use. HTTP isubiquitous and well understood. Replication works incrementally; that is,if during replication anything goes wrong, like dropping your networkconnection, it will pick up where it left off the next time it runs. It alsoonly transfers data that is needed to synchronize databases.
A core assumption CouchDB makes is that things can go wrong,like network connection troubles, and it is designed for graceful errorrecovery instead of assuming all will be well. The replication system’sincremental design shows that best. The ideas behind “things that can gowrong” are embodied in the Fallacies of Distributed Computing:
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn’t change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
Existing tools often try to hide the fact that there is a network and thatany or all of the previous conditions don’t exist for a particular system.This usually results in fatal error scenarios when something finally goeswrong. In contrast, CouchDB doesn’t try to hide the network; it just handleserrors gracefully and lets you know when actions on your end are required.
1.2.6. Local Data Is King
CouchDB takes quite a few lessons learned from the Web,but there is one thing that could be improved about the Web: latency.Whenever you have to wait for an application to respond or a website torender, you almost always wait for a network connection that isn’t as fast asyou want it at that point. Waiting a few seconds instead of millisecondsgreatly affects user experience and thus user satisfaction.
What do you do when you are offline? This happens all the time – your DSL orcable provider has issues, or your iPhone, G1, or Blackberry has no bars,and no connectivity means no way to get to your data.
CouchDB can solve this scenario as well, and this is where scaling isimportant again. This time it is scaling down. Imagine CouchDB installed onphones and other mobile devices that can synchronize data with centrallyhosted CouchDBs when they are on a network. The synchronization is not boundby user interface constraints like sub-second response times. It is easier totune for high bandwidth and higher latency than for low bandwidth and verylow latency. Mobile applications can then use the local CouchDB to fetchdata, and since no remote networking is required for that,latency is low by default.
Can you really use CouchDB on a phone? Erlang, CouchDB’s implementationlanguage has been designed to run on embedded devices magnitudes smaller andless powerful than today’s phones.
1.2.7. Wrapping Up
The next document Eventual Consistency further explores the distributednature of CouchDB. We should have given you enough bites to whet your interest.Let’s go!