How does a Relational Database Work ? [Part 1]

Malo Le Goff
4 min readJun 20, 2021

This article is the first of a series in which we’ll talk about the workings of Relational Database Management Systems (RDBMS)
In fact, RDBMS such as PostgreSQL, MySQL, …. are widely used today and among the most complete software architecture in use. They are the result of decades of research and optimization. Yet, they are used as black boxes by most users.
This article aims to explain why we use a database in the first place and how does the database store the data

Before diving in, I just want to make a quick refresher :
People often get confused between the DBMS and the DB, just to be clear : A Relational Database Management System is a piece of software that is supposed to manage (hence the name) a relational database. The DB itself is just the cluster of data
Examples : MySQL, PostgreSQL, … these are all RDBMS.

I. Why do we need a Database in the first place ?

First thing first, why do we need a DB and a DBMS to store data ? After why not just store everything in spreadsheets

Because databases coupled with DBMS are a convenient way to :

  • Manage large amounts of data, it would be much more complicated with spreadsheets. Indeed, with a lot of rows, the spreadsheet would just be so slow we could not use it anymore. Whereas with the DB, we have plenty of tools to deal with large datasets like indexing, partitioning and clustering
  • Implement checks, constraints, … to make sure the data is correct
  • Update data with a DML (Data Manipulation Language) such as SQL.
  • With the correct design, we can ensure data integrity and minimize redundancy
  • Fetch data from the database, done with DQL (Data Query Language)
  • Enforce the ACID properties
  • Handle fault-tolerance and concurrency

And so on, there are still plenty of reasons to use a DBMS rather than a bunch of spreadsheets but I will stop here.

II. How does a database store the data ?

We all know the tabular representation of data in RDBMS :

But it’s not exactly stored like this on disk. And that’s what I want to show you, how it is really stored on disk and not just how we represent the data.

In fact, each row in a table is stored as a record on disk with this kind of structure :

The record is a sequence of bytes split into 2 areas :

  • The green area stores the information about the columns of the records with a fixed length. Like the total length of the data or the actual data
  • And the blue area does the same but with columns whose length can vary.

So that’s how a record is stored on disk.

But records need to be organized, otherwise, it is a mess to compute and it’s not efficient. That’s why DBMSs store and put them inside what we call pages.

A page is a structure of 8 KB (size may change) which contains 2 major sections : the header and the body :

  • The body stores the data record
  • The header stores metadata about this page so how many records are stored in the body for instance

So now we have pages, which are a bunch of records but our tables like the one below are not implemented as pages on disk.

This kind of table may be implemented with data structures like flat files, ISAM, heap files, hash buckets or B+-trees.

Pages are stored inside these data structures, if we consider the main data structures in used in DB, we have :

  • Heap files : Which is basically an unordered set of pages
  • Hash bucket/table : A hash function computes the hash of the page that is about to be stored. The page will be stored in the bucket whose hash corresponds to the hash of the page :
  • B+-trees : They are basically trees of pages. They are the most common data structure used in DBMS because they do random and sequential access efficiently. That is to say, they can efficiently retrieve either a random page in the tree or a sequence or pages.

So, finally, we find it. Tables are stored as Hash tables or B+-trees or heap files which are data structures containing pages which contain records.

Thanks for reading this ! I hope you enjoyed the article, let me know in the comment what you thought about it !

Here is my LinkedIn if you want to contact me : https://www.linkedin.com/in/malo-le-goff-480452170/?locale=en_US

--

--

Malo Le Goff

Student Engineer | Engineering school : IMT Atlantique | Software Engineering & Data Science & Cybersecurity