Designing scalable and reliable systems requires understanding a core set of reusable building blocks. This article covers the most common components — from load balancers and databases to caches and CDNs — and the key trade-offs to consider when using them.
Load Balancer with Multiple Web Servers and Auto Scaling#
A load balancer distributes incoming traffic across multiple web servers, providing a single public entry point to the system.
- Only one public IP is exposed to the internet (the IP of the load balancer)
- Web servers communicate with each other and the load balancer using private IPs, improving security
- Multiple web servers in the load-balanced set provide failover and high availability
- Auto scaling: Automatically adds or removes web servers based on traffic
Stateful vs. Stateless Web Servers#
Stateful architecture stores user session data on a specific server, meaning requests from the same user must always route to the same server.
- Most load balancers support this via sticky sessions, but with added overhead
Constraints of stateful architecture:
- Scaling the number of web servers up or down becomes difficult
- Handling server failure becomes difficult, as session data may be lost
Stateless architecture moves session state out of the web server (e.g., into a shared cache or database), allowing any server to handle any request. This is generally preferred for scalability.
geoDNS with Multiple Data Centers#
Web servers and other components (databases, caches) are distributed across data centers in different geographical regions. Users are routed to the nearest data center via geoDNS.
- Provides failover and high availability across regions
Be careful:
- Data consistency: Users in different regions may read from different databases. Data synchronization must be implemented to keep data consistent during failover.
- Deployment: Services must be deployed consistently across all data centers.
Database Replica#
A primary-replica (master-slave) architecture where the primary node handles writes and replica nodes handle reads. Typically, there are far more replicas than primary nodes.
Ensures:
- Performance: Read and write operations can happen simultaneously
- Reliability: Multiple database copies prevent data loss
- High availability: Supports failover if one node goes down
Cache#
An in-memory data store that sits between the application and the database to serve frequently accessed data faster.
Benefits:
- Improves response performance
- Reduces load on the database
- Can be scaled independently of the database tier
Be careful:
- Expiration policy: Stale data can be served if TTL is set too high
- Data consistency: Data may differ between the database, cache, and CDN
- Eviction policy: When the cache is full, decide which data to remove — common strategies include LRU (Least Recently Used) and FIFO (First In, First Out)
CDN#
A Content Delivery Network caches static assets (images, CSS, JavaScript) across servers in different geographical regions, serving users from the nearest location for better performance.
Be careful:
- Cache expiration time: Setting TTL too long can serve stale assets; too short reduces CDN effectiveness
- CDN failure handling: Applications should fall back gracefully to the origin server if the CDN is unavailable
Message Queue#
A message queue decouples work producers from consumers, allowing tasks to be processed asynchronously by worker services. This makes the system more reliable and scalable — producers and consumers can be scaled independently and are not affected by each other’s failures.
