Building a Distributed Object Storage System

When I set out to build s3-like-storage, I wanted to understand the fundamental challenges of distributed storage systems. Here's what I learned.

Key Design Decisions

Traditional hash-based distribution fails when nodes are added or removed. Consistent hashing with virtual nodes provides:

We chose 3-way replication with the following write path:

Client → Primary → Replica 1 → Replica 2 → Ack
                   (async)      (async)

Writes are acknowledged after the primary confirms, with async replication for durability.

PostgreSQL handles metadata (bucket info, object metadata, ACLs) while actual data lives on distributed nodes. This separation allows:

Network partitions can cause split-brain scenarios. Our solution uses a quorum-based approach where writes require majority acknowledgment.

Multipart upload was essential for large files. We implemented:

Key optimizations that improved throughput:

The final system achieves 1,200 req/s for small files and 600 MB/s for large file streaming.