Skip to main content

RDBMS, NoSQL, ACID, CAP theorem, and Scaling

·592 words·3 mins
Kuo-Hsiu (Kourtney) Lee
Author
Kuo-Hsiu (Kourtney) Lee
Software developer. Writing about systems, architecture, software design, and technologies.

RDBMS
#

Relational Database Management System

  • Used when there are strong Relations between data:
    • Design a schema that is unlikely to change, relating tables to each other, then you can retrieve the desired data through SQL.
  • Used when data correctness is very important:
    • Usually provides ACID properties.
  • Changing the schema is a huge undertaking:
    • Requires updating the table schema and migrating data.
    • All programs that use the table with the changed schema need to be modified.
  • join operations can be performed across different tables.
  • Vertical scaling (scale up) is more effective (improving machine performance).
    • Performing a join across a network on distributed data slices introduces massive latency bottlenecks and network overhead.

ACID
#

RDBMS usually guarantees four properties for transactions:

  • Atomicity Only two possibilities: all completed (Commit) or all not done (Abort). There is no “half-done” state. If there is an error during execution, it will Rollback to the state where nothing was done.

  • Consistency The database will remain in a legal state before and after the transaction.

  • Isolation When multiple transactions need to be executed, each transaction is separate and does not interfere with the others. A transaction between A and B does not affect a transaction between B and C.

  • Durability Once a transaction is completed, it is permanently valid and will not be lost, even if the system suddenly fails.

NoSQL
#

Not only SQL

  • Less concerned with relations between data:
    • Does not require a fixed schema for data access.
    • Each piece of data exists independently, without issues of who relates to whom.
  • More concerned with the content of the data. Can be grouped into four categories:
    • Key-value stores
    • Graph stores
    • Column stores
    • Document stores
  • join operations are usually not supported in NoSQL
  • Horizontal scaling (scale out, sharding) is more effective (adding more machines).
    • For NoSQL database clusters, two of the CAP properties are usually provided.
    • Horizontal scaling is more desirable for large scale applications, since it provides failover and redundancy.

CAP Theorem
#

For a distributed system, it is impossible to guarantee all three CAP properties simultaneously (though they might coexist when the network is stable). At most, two can be guaranteed simultaneously.

  • Consistency Every read, if it doesn’t result in an error, will return the result of the most recent write. => Data on every node is identical.

  • Availability Every request will receive a non-error response, regardless of whether the data returned by this response is the latest. => Guarantees that data will always be returned, but the data might be old.

  • Partition tolerance Even if some messages transmitted between nodes are delayed or lost, the system will continue to operate. => When network issues occur, the normally connected part of the nodes can continue to operate.

How to choose
#

NoSQL
#

  • Low latency
  • Unstructured data, or no relational data
  • Only have to serialize and deserialize data (JSON, XML, YAML, etc.)
  • A massive amount of data required to store

Scaling
#

Vertical Scaling (Scale up)
#

Adding more CPU, RAM, disk space to the machine

  • Risk of SPOF (Single Point Of Failure)

Horizontal Scaling (Scale-out, Sharding)
#

Adding more servers. Each shard stores unique data of the same schema.

  • A sharding key (partition key) and the sharding function are used to determine which shard should be used to store the data. It can be one or more columns of the data.

Challenges:

  • Shard exhaustion: sharding function will have to be updated, and the data will have to be moved.
  • Celebrity problem
  • Unable to perform join across shards: denormalization is required.

Reference
#