HOME / BLOG / Building a Distributed Object Storage …

> Building a Distributed Object Storage System

|
~5min READ
Design decisions and implementation details of building an S3-compatible distributed storage system from scratch.

When I set out to build s3-like-storage, I wanted to understand the fundamental challenges of distributed storage systems. Here's what I learned.

Key Design Decisions

1. Consistent Hashing

Traditional hash-based distribution fails when nodes are added or removed. Consistent hashing with virtual nodes provides:

  • Minimal data movement during rebalancing
  • Even load distribution
  • Graceful handling of node failures

2. Replication Strategy

We chose 3-way replication with the following write path:

Client → Primary → Replica 1 → Replica 2 → Ack
                   (async)      (async)

Writes are acknowledged after the primary confirms, with async replication for durability.

3. Metadata vs. Data Separation

PostgreSQL handles metadata (bucket info, object metadata, ACLs) while actual data lives on distributed nodes. This separation allows:

  • ACID transactions for metadata
  • Scalable data storage
  • Independent scaling of each layer

Challenges Encountered

Split-Brain Prevention

Network partitions can cause split-brain scenarios. Our solution uses a quorum-based approach where writes require majority acknowledgment.

Large File Handling

Multipart upload was essential for large files. We implemented:

  • Part tracking in metadata store
  • Parallel part uploads
  • Atomic completion with manifest

Performance Tuning

Key optimizations that improved throughput:

  1. Connection pooling: Reuse TCP connections to nodes
  2. Buffer sizing: 64KB buffers for network I/O
  3. Async I/O: Non-blocking disk operations
  4. Redis caching: Hot metadata in memory

The final system achieves 1,200 req/s for small files and 600 MB/s for large file streaming.