This is part two of a two part series about databases. The first post focuses on theory, and this post focuses on the practical decision of which database to choose.
MongoDB, Redis, Cassandra, and CouchDB are some examples of popular NoSQL databases. There are a variety of types of NoSQL databases, based on either key-value pairs, documents, graphs, or columns (find out more here). MongoDB, a key-value store, has been extremely popular given how easy it is to set up and use. It comes with great flexibility, as schemas are dynamic (can be changed on the fly). If your data isn’t heavily interrelated, it’s a great choice.
Choosing MongoDB optimizes for developer time in part because of its flexibility. Setting up a MongoDB database is quick and easy. Further, your schema can be changed on the fly. As you store arbitrary objects in them, these objects can have completely different kinds of content in the fields; you don’t need to honor any schema. For this reason, it’s a great choice for projects where the schema is not well defined or could change. If you change your code, your data will go into your database with its new schema.
Further, this means you can change what your database is doing and keep your service running. Avoiding downtime is a big plus. The database won’t reject data of an unexpected type (a relational database most likely would). With smaller data sets, non-relational data is usually not a problem. Further, NoSQL is much easier to scale across many servers, and is cheaper. If all you need is simple persistent data, a NoSQL database will work well.
My group chose to use MongoDB for Glint, a forum for finding and getting feedback on project ideas. We wanted to optimize for developer time. Our needs were to track who contributed the idea, and the number of votes, so key-value pairs worked well. Upon scaling, we would consider transferring to a well indexed MySQL database to quickly see, for example, overall number of votes for a given user.
SQL has been around for years. It is battle tested at enormous scale. For a simple relational solution, SQLite exists as well. SQL databases offer powerfully interconnected data, meaning you can access and use the complex relationships within data. SQL is ACID compliant, which is a big deal. Further, well indexed data on well designed MySQL databases are performant.
Downsides to using a MySQL database include the database must be offline to easily restructure the tables. This means your service is unavailable during that time. More data generally means a bigger server. It requires some time thinking about data before using it- including the types of data you’re storing, and the important relationships between your data. It takes more time to set up for these reasons. SQL databases are less flexible – all columns have the same type of data and are the same size, defined ahead of time. The columns can’t be changed with ease to hold more characters. A new column can’t just be added to your table. Lastly, it is more expensive to maintain (you might need a dedicated Database Administrator for large datasets).
Appendix – A few useful database concepts:
CAP Theorem: http://en.wikipedia.org/wiki/CAP_theorem
Normalized vs Unnormalized Data: http://en.wikipedia.org/wiki/Database_normalization