RDBMS and NoSQL Differences Notes
RDBMS
Relational Database Management System
- Used when there are strong Relations between data:
- Design a schema that is unlikely to change, relating tables to each other, then you can retrieve the desired data through SQL.
- Used when data correctness is very important:
- Usually provides ACID properties.
- Changing the schema is a huge undertaking:
- Requires updating the table schema and migrating data.
- All programs that use the table with the changed schema need to be modified.
- Vertical scaling is more effective (improving machine performance).
ACID
RDBMS usually guarantees four properties for transactions:
Atomicity Only two possibilities: all completed (Commit) or all not done (Abort). There is no “half-done” state. If there is an error during execution, it will Rollback to the state where nothing was done.
Consistency The database will remain in a legal state before and after the transaction.
Isolation When multiple transactions need to be executed, each transaction is separate and does not interfere with each other. Transaction A and B does not affect transaction B and C.
Durability Once a transaction is completed, it is permanently valid and will not be lost, even if the system suddenly fails.
NoSQL
Not only SQL
- Less concerned with relations between data:
- Does not require a fixed schema for data access.
- Each piece of data exists independently, without issues of who relates to whom.
- More concerned with the content of the data:
- Whether updates, additions, deletions, etc., are needed.
- Data can have different formats.
- More suitable for distributed systems.
- Usually provides two of the CAP properties.
- Horizontal scaling is more effective (adding more machines).
CAP
For a distributed system, it is impossible to guarantee all three CAP properties simultaneously (though they might coexist when the network is stable). At most, two can be guaranteed simultaneously.
Consistency Every read, if it doesn’t result in an error, will return the result of the most recent write. => Data on every node is identical.
Availability Every request will receive a non-error response, regardless of whether the data returned by this response is the latest. => Guarantees that data will always be returned, but the data might be old.
Partition tolerance Even if some messages transmitted between nodes are delayed or lost, the system will continue to operate. => When network issues occur, the normally connected part of the nodes can continue to operate.