SQL vs NoSQL Databases – Part 2: Practice

This is part two of a two part series about databases. The first post focuses on theory, and this post focuses on the practical decision of which database to choose.

MongoDB, Redis, Cassandra, and CouchDB are some examples of popular NoSQL databases. There are a variety of types of NoSQL databases, based on either key-value pairs, documents, graphs, or columns (find out more here). MongoDB, a key-value store, has been extremely popular given how easy it is to set up and use. It comes with great flexibility, as schemas are dynamic (can be changed on the fly). If your data isn’t heavily interrelated, it’s a great choice.

Choosing MongoDB optimizes for developer time in part because of its flexibility. Setting up a MongoDB database is quick and easy. Further, your schema can be changed on the fly. As you store arbitrary objects in them, these objects can have completely different kinds of content in the fields; you don’t need to honor any schema. For this reason, it’s a great choice for projects where the schema is not well defined or could change. If you change your code, your data will go into your database with its new schema.

Further, this means you can change what your database is doing and keep your service running. Avoiding downtime is a big plus. The database won’t reject data of an unexpected type (a relational database most likely would). With smaller data sets, non-relational data is usually not a problem. Further, NoSQL is much easier to scale across many servers, and is cheaper. If all you need is simple persistent data, a NoSQL database will work well.

My group chose to use MongoDB for Glint, a forum for finding and getting feedback on project ideas. We wanted to optimize for developer time. Our needs were to track who contributed the idea, and the number of votes, so key-value pairs worked well. Upon scaling, we would consider transferring to a well indexed MySQL database to quickly see, for example, overall number of votes for a given user.

SQL has been around for years. It is battle tested at enormous scale. For a simple relational solution, SQLite exists as well. SQL databases offer powerfully interconnected data, meaning you can access and use the complex relationships within data. SQL is ACID compliant, which is a big deal. Further, well indexed data on well designed MySQL databases are performant.

Downsides to using a MySQL database include the database must be offline to easily restructure the tables. This means your service is unavailable during that time. More data generally means a bigger server. It requires some time thinking about data before using it- including the types of data you’re storing, and the important relationships between your data. It takes more time to set up for these reasons. SQL databases are less flexible – all columns have the same type of data and are the same size, defined ahead of time. The columns can’t be changed with ease to hold more characters. A new column can’t just be added to your table. Lastly, it is more expensive to maintain (you might need a dedicated Database Administrator for large datasets).

Appendix – A few useful database concepts:

CAP Theorem: http://en.wikipedia.org/wiki/CAP_theorem
Normalized vs Unnormalized Data: http://en.wikipedia.org/wiki/Database_normalization

SQL vs NoSQL Databases – Part 1: Theory

This is part one of a two part series about databases. The first post focuses on theory, and the second part focuses on the practical decision of evaluating which database to choose for a project.

There are two kinds of databases: relational databases and non-relational databases. What is the difference, and what considerations are relevant when choosing which to use?

Developers refer to the contrast of relational and non-relational databases as SQL vs NoSQL. SQL represents relational databases. It is actually a query language for relational databases, most prominently MySQL. NoSQL, meaning Not Only SQL, is a more diverse collection of databases. It refers to any type of non-relational database. The use of NoSQL databases has grown in popularity over the last decade for a few reasons discussed in this post and the follow-up post.

Databases store data. Sometimes the data being stored is fairly straightforward. For a simple contact list, all you may need is a simple key-value pair, as in a person’s name and their phone number. You may just want a person’s phone number, accessing it using their name. For a task like this, a non-relational database works well. You can use a key to access its associated value (even if the value has more information than just a number- it could also have an address, a city, or more). The data can be stored essentially in simple key-value pairs.

A relational database should be used when the connections between data is important. If it makes sense to store the data in tables and rows, use a relational database. Tables and rows represent the underlying structure of relational databases.

This allows, for example, ease in finding all the entries with the same area code quickly. You can query multiple tables easily by joining them together. In a non-relational database, you would have to check each key-value pair, check if the value matches, and store that which of the numbers have the same area code.

Sometimes it is essential or helpful to keep track of many aspects of the data. I might want to easily access all my friends who have the same area code, or live on the same street or in the same city. This would take a long time in a non-relational database with many entries (unless you have a lot of redundancy, which takes up more memory- see the appendix here). However, it would be easy to perform this query in a well designed relational database. Once any two or mote tables are joined together, the new table can be easily queried. Joining tables allows for a variety of useful queries.

One of the main tradeoffs you get in the variety of available queries you lose in the convenience of set up. For example, in a MySQL database you have to define how many characters and what type of thing goes in a given column or row. This cannot be changed on the fly- most commonly, your database has to be disconnected when redesigning your database tables.

I’ll discuss in the next post how to choose a SQL or NoSQL database for your project.