DDIA - Chapter 2 - Data Models and Query Lang - thoughts and notes

In my journey of reading DDIA (Designing Data Intensive Applications) I am now on the second chapter. The book club has given 2 weeks to finish this chapter because there is thesis presentation by a member of the book club the next week.
This chapter talks about how humans have spent countless hours on building and researching databases. Databases are the backbone of all compute. A lot of software engineering is taking in text, converting it into some other text and storing it. So you see why dbs are important.
This chapter highlights the importance of data models and how they effect the performance and efficiency of software. Given the use case the choice of data models and how you store data changes.
Broadly there are three kinds of data models
SQL (Relational)
NoSQL (Document based)
Graph (many-many relationships)
There are many examples in the chapter about where one would prefer relational dbs over document based dbs. Most of it hinges on data locality. Do you want to query 4 tables to build a profile or do you want to query a single collection to build a profile?
JSON file format has taken over the world now, you will find that most of the API services use JSON for their body.
The document database gets rid of impedance mismatch which is prevalent in relational databases. ORMs!
Document databases are sometimes called schema-less, but a more appropriate term is schema-on-read.
Schema-on-read is similar to dynamic (runtime) type checking, whereas schema-on-write is similar to static (compile-time) type checking. Document databases are more closer to the data structures, and fit better with the JSON object model.
If you have a many-many relationship data model then it is not advised to use a document database. Also joins are not well supported in document dbs. With SQL you get a query optimiser so you don't have to worry about queries and their runtime. The SQL compiler is general purpose so one doesn't need to make any changes whenever new fields/tables are introduced.
Most databases are now becoming more and more similar to each other, many relational databases now support JSON fields/values, and document based databases support the aggregate functions (joins). A hybrid approach towards data is the future. Whatever fits your use case better, use that.
The author also talks a bit about MapReduce, I prefer to call it "divide it into chunks and then bring all the results together".
There are examples and details about Graph based databases, but I won't write about it at length here. Just ask google/chatgpt/claude about Graph based data and it will tell you all you need.
If you have done DSA questions then you might be familiar with the graph data structure. You know Meta has a graph based database and it has mapped all your personal information there and it uses that to target ads. I cannot imagine the scale at which it works, like a billion people in a graph with so many different attributes mapped inter-connectedly to each other. Wow.
Anyway, the author also talks about different Query Languages and I won't talk about them as well because most of it is theoretical and can be easily found on the internet. These are my thoughts and notes and I am not preparing for an exam where I have to write a definition about Triple Stores and SparQL. :')




