Choosing a Database Without Hype and Buzz

The modern informational technology industry is obsessed with hype. You're obliged to pick a new fancy tech or you will be blamed in backwardness and people won't be interested in what you're building. But what is more important: the tools you're using or a product that you make?

Choosing a Database Without Hype and Buzz

The modern informational technology industry is obsessed with hype. You're obliged to pick a new fancy tech or you will be blamed in backwardness and people won't be interested in what you're building. But what is more important: the tools you're using or a product that you make? Try to put customers and their needs in the center of your development universe and ask yourself, do they really care about how trendy is the tech powering the product they use?

There is not so much noise around databases as, for example, in the front-end development world. There are traditional relational databases like MySQL, Oracle, SQL Server, PostgreSQL, etc, more hyped key-value or document storages like MongoDB, CouchDB, etc, column storages modeled after Google's Big Table and many many more. I would add a standout category, consisting of only a single product: SQLite. It's a simple lightweight and reliable solution for many purposes without a hurdle of an additional service to manage.

So, do you really want key-value storage? There's a lot of rant about MongoDB in recent time, like this and this. I would extend this to any key-value storage. Why people tend to use them even if they do have much simpler and tested-by-time alternatives like MySQL? I have no other explanation other than hype, hype and again hype. People around use it, so why not me? I'll pick that fancy software and build a product that is better than competitors because I have much more advanced tech than they. The reality is that it isn't more advanced, it just serves a different purpose.

One real case of a database misuse from my experience: I was hired to help a guy to finish an online service written in Go and based on Google Cloud Platform. Developer who had started the project, picked key-value storage (Google Cloud Datastore) to keep all service' data. Then we had tried to implement the infinite feed using a complicated algorithm with complex selection criteria. Key-value storages suck at complex aggregates and we end uploading everything from the storage and doing all the calculations in the code. I could do the same with just a few lines of SQL! It was an obvious case where a key-value database was the wrong choice for storing data.

So I developed a rule of thumb: whenever in doubt, pick a traditional relational database with an established ecosystem, like PostgreSQL or MySQL. I'm not aware of any specific business tasks that are only possible to solve with NoSQL databases. If you're not at a Google scale, you don't need Cassandra or Hadoop to store and process data. Properly configured Postgre can handle terabytes of data on a single machine and it's more than enough for 99% of the projects.

In most cases, you should not bother setting up a separate solution for analytical loads too. Relational databases can handle dimensional data very well, just take care of the ETL layer and optimize tables/views.

To be honest, there's another side of the development universe. Developers. Will they be happy working with a boring traditional stack? Most of the developers I know, will not (the professional ethics behind this arrogance behavior is out of the scope of the present article). Great developers are naturally very curious people and always prefer new over established. So, you should let the kids play and at the same time deliver something to your customers. Strictly speaking, there is no universal answer to this dilemma. But there are many areas for experimentation outside the way to store data: languages, frameworks, architecture patterns. I'd prefer to be conservative about a database and more flexible about the rest.

I'm not against a bleeding edge, but be very careful picking a product built inside a large technological leader. They have people, time and money to handle it. They have very specific tasks and develop products solely based on their needs. Picking their perfectly promoted tech won't make you a new Google or Facebook.