SQL, NoSQL & NewSQL

--

Until several years ago, the database world was simple and all SQL. Nowadays, we also have NoSQL and NewSQL. Now the question is which one is better and for what? Let us guide you in a quick tour in this jungle of data management. In other posts, we address NoSQL (see blog post on NoSQL) and NewSQL (see blog post on NewSQL) in more detail.

Figure 1: Workload of Operational Databases (OLTP) vs Analytical Databases (OLAP)

SQL databases come in two main variants, operational and analytical, which have been around for more than three decades. Operational databases have been designed for OLTP (On Line Transactional Processing) workloads. Conversely, analytical databases have been designed for data warehouses and OLAP (On Line Analytical Processing) workloads. OLTP databases are designed to handle both reads and updates, while OLAP databases are designed for read-only workloads.

Figure 2: OLTP DBs handle reads & updates vs OLAP DBs handle read-only queries

OLTP databases excel at updating data in real-time and keeping it consistent in the advent of failures and concurrent accesses by means of ACID transactions. OLAP databases, on the other hand, support data loading but no ACID properties.

Figure 3: ACID Support in OLTP vs OLAP

OLTP databases have a hard time answering large analytical queries. They just support inter-query parallelism to deal with OLTP workloads that consists mainly of short queries. However, OLAP databases exploit parallel processing to reduce the response times of large analytical queries in read-intensive applications. In particular, they implement intra-query intra-operator parallelism.

Figure 4: OLTP provides inter-query parallelism while OLAP provides intra-query parallelism

The two variants are complementary and data. Most organizations implement ETL (Extraction Load Transform) processes to copy data from operational databases into OLAP databases so the data can be analytically exploited in the OLAP databases.

Figure 5: OLTP vs OLAP Summary

Traditional operational databases do not scale well. We can distinguish three kinds of systems with respect to scalability: databases that scale up (typically in a mainframe or database appliance) but do not scale out (see blog post on scalability); databases that scale out to a few nodes using a shared-disk architecture (see blog post on shared-nothing) and databases that scale out logarithmically for read-intensive workloads using cluster replication (see blog post on cluster replication). However, not all operational databases propose these three flavors (some are even just centralized with no horizontal scalability at all). They either do not scale out, or scale out logarithmically to low numbers of nodes. This means that the only solution is to scale up using a bigger machine. But most operational databases do not scale up well due to NUMA effects and the fact that scaling up is both very expensive in hardware (exponential growth in price) and limited by the biggest machine existing in the market at a particular time.

Figure 6: Traditional SQL Databases Scalability: Vertical and Horizontal

Traditional operational databases also have a hard time with data ingestion. The main solution to ingest large amounts of data is application-level sharding, whereby the application splits the whole database in smaller chunks and then uses different independent databases to store each of the chunks. This explicit sharding introduces a major difficulty as the database can no longer be queried as a whole. Thus, the application is forced to query the different databases independently and combine the query results, which may be very difficult for some queries, e.g. involving joins. High-rate data ingestion is also a problem, for two reasons. Firstly, updates are transactional, and transactions have a cost. Secondly, the data is typically stored in a B+ tree structure, which is efficient when querying data but inefficient when inserting/updating data when it does not fit in the memory because of I/Os per inserted/updated row.

Figure 7: Properties of SQL Databases

Continue reading on LeanXcale blog.

--

--

Prof. Ricardo Jimenez-Peris, PhD in CS
Distributed Databases Concepts

Professor and scientist on distributed databases. Founder and CEO of LeanXcale