LinkedIn Interview Question

Design a Distributed ID Generator — LinkedIn Interview

medium20 minBackend System Design

How LinkedIn Tests This

LinkedIn interviews focus on social graph systems, distributed ID generation, metrics and logging infrastructure, and production monitoring. They test your understanding of systems supporting professional networking at scale.

Interview focus: Distributed ID generators, metrics and logging systems, and professional networking infrastructure.

Key Topics
distributed systemsid generatorsnowflakeuuidclock skewk sortable

How to Design a Distributed ID Generator

Generating a unique ID sounds trivial. Until you have to do it on a hundred servers simultaneously, without any of them talking to each other, millions of times a second, and guarantee no duplicates — ever.

That's what makes this question interesting.

It appears regularly at Google, Meta, Amazon, Twitter, LinkedIn, and Discord. It's considered a medium-difficulty question because the happy path is well-understood. The depth lives in the trade-offs: why UUIDs hurt database performance, exactly how Snowflake's three-part structure prevents collisions without coordination, and what happens when a server's clock ticks backward.

Candidates who explain what to build pass. Candidates who explain why each design decision was made get offers.

That gap — between knowing the answer and explaining it clearly under pressure — is exactly what Mockingly.ai is designed to close. But first, let's build the design.


Step 1: Clarify the Scope

Interviewer: Design a distributed ID generator.

Candidate: A few clarifying questions. Do the IDs need to be globally unique, or just unique within a service? Should they be numeric or can they be strings? Do they need to be sortable by creation time — meaning newer IDs should sort after older ones? What throughput are we targeting? And do they need to be unpredictable — for example, should users be unable to guess adjacent IDs?

Interviewer: Globally unique, 64-bit numeric. Roughly sortable by time — newer is larger. Target millions of IDs per second across hundreds of servers. Unpredictability is a nice-to-have but not a hard requirement.

Candidate: Good. Those constraints eliminate a few approaches immediately and point clearly toward a Snowflake-style design. Let me walk through the alternatives and their trade-offs before arriving there.


Requirements

Functional

  • Generate globally unique IDs on demand
  • IDs are 64-bit integers (fits in a BIGINT, compact storage)
  • IDs are roughly sortable by creation time (newer ID > older ID)
  • System works across hundreds of independent servers with no coordination per request

Non-Functional

  • High throughput — millions of IDs per second, across all nodes combined
  • Low latency — ID generation in microseconds, no network round-trips
  • High availability — any single node failure must not prevent ID generation on other nodes
  • No central bottleneck — nodes generate IDs independently

Back-of-the-Envelope Estimates

Interviewer: What are the numbers?

Candidate:

plaintext
Target throughput:        10 million IDs per second (system-wide)
Servers generating IDs:  ~200 nodes
Per-node throughput:      10M / 200 = 50,000 IDs per second per node
Per-millisecond per node: 50,000 / 1000 = 50 IDs per millisecond per node
 
ID size: 64-bit integer = 8 bytes per ID
Storage for 1 year at 10M IDs/sec:
  10M × 86,400 × 365 × 8 bytes ≈ ~2.5 PB
  (IDs themselves; actual record sizes are larger)
 
64-bit signed integer maximum: 9,223,372,036,854,775,807

The key takeaway: 50 IDs per millisecond per node is very modest. A well-designed 64-bit structure can handle 4,096 IDs per millisecond per node — we have 80× headroom. The design constraint isn't throughput. It's uniqueness without coordination.


Why the Obvious Approaches Fail

Interviewer: Why not just use a database auto-increment?

Candidate: Auto-increment works perfectly on a single database. You can't run it across many databases without coordination.

If you have 10 database shards each auto-incrementing from 1, every shard will generate ID 1, ID 2, ID 3. They collide immediately.

The workaround is range partitioning — shard 1 owns IDs 1–1M, shard 2 owns 1M–2M, and so on. But this creates new problems: a high-traffic shard exhausts its range quickly while a low-traffic shard barely uses its allocation. Rebalancing requires downtime. Adding a new shard requires reorganising ranges. It's operationally brittle.

More fundamentally: any database call for an ID is a network round-trip. At millions of IDs per second, that's millions of database calls per second — instantly your ID service becomes the bottleneck.

Interviewer: What about UUIDs? They're randomly generated and don't need a database.

Candidate: UUIDs solve the uniqueness problem. They introduce a different problem: database performance.

A UUID is a 128-bit random value — something like 550e8400-e29b-41d4-a716-446655440000. It's globally unique without any coordination. But it has three meaningful costs.

First, storage. A UUID is 128 bits (16 bytes) vs. 64 bits (8 bytes) for a Snowflake-style ID. For a table with billions of rows, that's a meaningful difference in index size and memory pressure.

Second, index fragmentation. This is the most important one. Databases store B-tree indexes in sorted order. When you insert a record with a random UUID as the primary key, the insertion point is random — anywhere in the B-tree. Over time, the pages of the index become sparse and fragmented. Cache hit rates drop, because the "next" insert is rarely near the "last" insert in the index. Sequential IDs, by contrast, always append to the end — the B-tree stays dense and cache-friendly.

Discord measured this directly: switching from UUIDs to Snowflake IDs for their message IDs substantially reduced database write amplification and improved index performance.

Third, UUIDs are not sortable. A UUID generated a second ago and one generated five years ago cannot be compared by value to determine which is newer. Snowflake IDs can.

Interviewer: So neither auto-increment nor UUID is right. What's the correct approach?

Candidate: A Snowflake-style ID — a 64-bit integer that encodes a timestamp, a machine identifier, and a per-machine sequence number. Globally unique, no coordination needed, time-sortable, and compact. Let me explain the structure.


Snowflake IDs: The Structure

Twitter developed the Snowflake algorithm in 2010 to replace their auto-incrementing integer IDs, which became problematic as they scaled across multiple database shards.

A Snowflake ID is a 64-bit integer with three distinct sections:

plaintext
| 1 bit  | 41 bits        | 10 bits    | 12 bits         |
|--------|----------------|------------|-----------------|
| unused | timestamp (ms) | machine ID | sequence number |

Why this structure guarantees uniqueness:

Three scenarios cover every possible collision:

  • Different milliseconds → timestamp bits differ → IDs are unique
  • Same millisecond, different machine → machine ID bits differ → IDs are unique
  • Same millisecond, same machine → sequence number increments → IDs are unique

No two generators can produce the same 64-bit value. No coordination needed.

The Timestamp (41 bits)

41 bits of millisecond timestamp gives ~69 years of IDs before overflow.

plaintext
2^41 milliseconds = 2,199,023,255,552 ms
÷ 1000 (to seconds)
÷ 60 (to minutes)
÷ 60 (to hours)
÷ 24 (to days)
÷ 365 (to years)
≈ 69.7 years

This uses a custom epoch — not Unix epoch (Jan 1, 1970), but a more recent date chosen when the system was built. Twitter uses November 4, 2010. Discord uses January 1, 2015. Using a custom epoch stretches the 69-year lifespan relative to the system's actual start date, not 1970.

Why the first bit is unused: signed 64-bit integers can be negative if the sign bit is set. Keeping bit 63 as 0 ensures all Snowflake IDs are positive — safe across programming languages and databases that may handle signed vs unsigned integers differently.

The Machine ID (10 bits)

10 bits supports 1,024 unique machine IDs (0 to 1023).

Each node in the ID-generating fleet is assigned a unique integer in this range. When a node starts up, it claims its machine ID from a registry — etcd or ZooKeeper, or a pre-configured environment variable.

The machine ID assignment problem is worth naming explicitly, because interviewers probe it.

Option 1: Static assignment. Each node gets a fixed ID written in configuration. Simple, no runtime dependency. Problem: when nodes are added or removed (auto-scaling), maintaining uniqueness across the config of hundreds of servers is operationally error-prone.

Option 2: Dynamic assignment via a coordination service. On startup, each node calls etcd or ZooKeeper to claim an available machine ID from a pool. When the node shuts down gracefully, it releases the ID back to the pool. The coordination service handles the mutual exclusion.

Option 3: Derived from IP address. Take the last 10 bits of the node's IP address (ip & 0x3FF). Works in stable environments. Breaks if two nodes share the same last 10 bits of their IP — possible but uncommon in a well-structured network.

In an interview, recommend option 2 as the production approach and name option 3 as a practical shortcut for simpler deployments.

Machine ID assignment is a detail that trips up a lot of candidates — it seems like a footnote until the interviewer asks "what happens when a node crashes before releasing its ID?" Having a crisp answer ready for that follow-up is the kind of thing that separates good prep from great prep.

The Sequence Number (12 bits)

12 bits supports 4,096 unique IDs per millisecond per machine (0 to 4095).

When a request arrives:

  • If the timestamp is the same as the last request: increment the sequence number
  • If the timestamp has advanced: reset the sequence number to 0

If 4,096 IDs are exhausted within a single millisecond, the generator waits for the clock to tick to the next millisecond before generating more.

The capacity this provides:

plaintext
4,096 IDs/ms/machine × 1,024 machines = 4,194,304 IDs per millisecond
× 1,000 = 4.2 billion IDs per second — system-wide

Our requirement was 10 million per second. We have 400× headroom. The Snowflake structure scales far beyond what most systems need.


Clock Skew: The Only Real Threat to Uniqueness

Interviewer: What happens if a machine's clock goes backward?

Candidate: This is the most important failure mode to understand.

If a machine generates ID with timestamp T=500, then NTP corrects the clock backward so the machine now reports T=498, and then the machine generates another ID with timestamp T=498 — that ID is potentially a duplicate. The same timestamp on the same machine with the same sequence number would produce the same 64-bit value.

The standard handling strategies:

1. Refuse and wait. If the current timestamp is less than the last-used timestamp, the generator refuses to produce IDs and sleeps until the clock catches up.

python
def next_id(self):
    ts = current_millis()
    if ts < self.last_timestamp:
        # Clock moved backward — wait until we've caught up
        sleep_until(self.last_timestamp)
        ts = current_millis()
    # proceed...

This is the approach most implementations take. It causes a brief pause but prevents duplicates.

2. Raise an error. For financial systems where a duplicate ID could be catastrophic, the node raises a fatal error and restarts. The load balancer routes traffic to healthy nodes while the affected node recovers. This avoids any risk of a duplicate ID — at the cost of brief unavailability on that node.

3. Use the monotonic clock. Most operating systems provide two clocks: the wall clock (actual time of day, adjustable by NTP) and the monotonic clock (time since boot, never goes backward). Use the monotonic clock to measure elapsed time and the wall clock only for the initial epoch offset. This allows detecting if the wall clock moved backward while still advancing the timestamp in the ID.

Practical guidance: NTP corrections are typically small — milliseconds, not seconds. The "refuse and wait" approach introduces a pause proportional to the correction magnitude, which is usually imperceptible. For most systems, option 1 is the right choice.


Generation Algorithm

The core generation logic runs in a single thread per machine (avoiding multi-threading complexity), protected by a mutex when called concurrently:

plaintext
On each ID request:
 
1. Get current timestamp in milliseconds
 
2. If current_timestamp < last_timestamp:
      → Clock moved backward: wait or error (see above)
 
3. If current_timestamp == last_timestamp:
      → sequence = (sequence + 1) & 0xFFF  (mod 4096)
      → If sequence == 0: wait for next millisecond
 
4. Else (new millisecond):
      → sequence = 0
 
5. last_timestamp = current_timestamp
 
6. Return:
   (timestamp - epoch) << 22
   | (machine_id << 12)
   | sequence

The bit-shifting assembles the three parts into the final 64-bit integer. This is pure arithmetic — no network calls, no locks beyond the local mutex. Generation completes in microseconds.


Architecture: Centralised vs Embedded

Interviewer: How do you deploy the ID generator? Do services call a central ID service, or does each service run its own generator?

Candidate: Two valid models with different trade-offs.

Centralised ID Service

A dedicated microservice runs the Snowflake generator. All other services call it via RPC when they need an ID.

plaintext
Service A → gRPC → ID Service (3 nodes, each with unique machine_id)
Service B → gRPC → ID Service
Service C → gRPC → ID Service

Advantages: simpler to reason about (machine IDs are owned by a small, fixed fleet), easy to monitor, easy to scale the ID service independently.

Disadvantages: every ID request is a network round-trip. At high throughput, even a 1ms RPC becomes a bottleneck. The ID service becomes a dependency — its availability affects every downstream service.

Embedded Library

Each service embeds the Snowflake generator as a library. The service itself is the "machine" and claims a machine ID on startup from a coordination service (etcd).

plaintext
Service A (machine_id=7)  → generates IDs locally
Service B (machine_id=42) → generates IDs locally
Service C (machine_id=99) → generates IDs locally

Advantages: no network round-trip, microsecond latency, no additional service to run or monitor.

Disadvantages: machine ID assignment becomes more complex at scale (hundreds of services each claiming IDs), harder to audit or observe globally.

For most systems, the embedded library is the right choice. The performance difference matters. The operational complexity of machine ID assignment via etcd is manageable. For simpler systems or when a centralised audit trail is critical, the centralised service is reasonable — but batch the calls (request 100 IDs at once, cache locally, hand them out one by one) to amortise the round-trip cost.


Real-World Variants

The Snowflake structure is a template. Different companies adapt the bit allocation to their needs.

Twitter (original Snowflake):

plaintext
1 bit unused | 41 bits timestamp | 10 bits machine | 12 bits sequence

Epoch: November 4, 2010. Up to 1,024 machines. 4,096 IDs/ms/machine.

Discord:

plaintext
1 bit unused | 41 bits timestamp | 10 bits worker | 12 bits sequence

Epoch: January 1, 2015. Structurally identical to Twitter's, with a newer epoch.

Instagram (eng blog, 2012):

Instagram used PostgreSQL to generate IDs server-side, with a schema-level approach:

sql
CREATE OR REPLACE FUNCTION next_id(OUT result bigint) AS $$
DECLARE
    epoch bigint := 1314220021721;
    seq_id bigint;
    now_ms bigint;
    shard_id int := 5;  -- hardcoded per DB shard
BEGIN
    SELECT nextval('insta5.table_id_seq') % 1024 INTO seq_id;
    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_ms;
    result := (now_ms - epoch) << 23;
    result := result | (shard_id << 10);
    result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;

Instagram's variant uses 41 bits of timestamp, 13 bits of shard ID, and 10 bits of sequence. They generate IDs inside PostgreSQL using clock_timestamp() — which uses the database server's clock, not the application server's.

LinkedIn (Snowplow):

LinkedIn's variant splits the 10 machine-ID bits into datacenter (5 bits) and machine (5 bits):

plaintext
1 unused | 41 timestamp | 5 datacenter | 5 machine | 12 sequence

This supports 32 datacenters × 32 machines per datacenter = 1,024 total nodes — same capacity, but with explicit datacenter awareness baked into the ID. Useful for routing and debugging: given an ID, you can instantly know which datacenter generated it.


k-Sortability: Why It Matters

Interviewer: You keep mentioning that Snowflake IDs are "sortable by time." Why does that matter?

Candidate: It has a concrete impact on database performance — specifically on how databases maintain B-tree indexes.

A B-tree index stores records in sorted key order. When you insert a new record, the database finds where that key belongs in the sorted order and inserts it there — potentially splitting index pages and rewriting portions of the tree.

With random UUIDs as primary keys, every insert lands at a random position in the B-tree. Over time, index pages become half-full and fragmented. The database wastes I/O reading and writing pages that are sparsely populated. Cache efficiency drops because recently-inserted records are scattered across the index, not co-located.

With Snowflake IDs — where newer IDs are always numerically larger — every insert appends to the end of the B-tree. Pages fill completely before a new page is allocated. The index stays dense, cache-friendly, and fast. Write amplification drops significantly.

"k-sortable" means IDs are roughly sorted — not perfectly (IDs generated within the same millisecond are not ordered relative to each other), but good enough. For B-tree performance, k-sortability is nearly as good as strict sequential ordering.

This is why LinkedIn, Twitter, Discord, and Instagram all chose timestamp-prefixed ID schemes over random UUIDs when they hit scale.

The B-tree fragmentation argument is also one of the most satisfying things to explain in an interview — it connects a data structure detail to a real-world engineering decision. If you want to practise landing that explanation under time pressure with a live interviewer, Mockingly.ai is built for exactly that.


Common Interview Follow-ups

"What happens when a machine ID is assigned to a node that crashes before releasing it?"

The machine ID is effectively leaked. If you assigned IDs from a pool in etcd with TTL-based leases, the lease expires after the TTL and the ID returns to the available pool. Set the TTL to a value longer than any expected operation window — but short enough that a crashed node doesn't lock out an ID for hours. Typical values: 30–60 seconds, renewed every 10 seconds while the node is alive.

"The 41-bit timestamp will overflow in 69 years. How do you handle that?"

The overflow date is determined by the chosen epoch. If you chose 2010 as your epoch, overflow happens around 2079. The practical answer: well before overflow, migrate to a new epoch (which requires a system-wide upgrade). Alternatively, use a different bit layout — allocating more bits to the timestamp (at the cost of fewer machine IDs or sequence slots). This is a planned migration, not an emergency. The 69-year window is intentionally long enough to be a future generation's problem.

"Can you extract the creation time from a Snowflake ID?"

Yes — that's one of the practical benefits. Extract the timestamp bits (right-shift by 22), add the epoch, and you have the millisecond the ID was created. No separate created_at column needed for coarse time queries.

python
EPOCH = 1288834974657  # Twitter's epoch
timestamp_ms = (snowflake_id >> 22) + EPOCH

Discord uses this to paginate through messages by time: "show me messages before this timestamp" becomes "show me messages with IDs less than the Snowflake ID for that timestamp."

"How do you handle ID generation in a serverless / auto-scaling environment where instances spin up and down constantly?"

Auto-scaling creates a machine ID management challenge: hundreds of short-lived instances each need a unique ID. Two approaches:

First, use a coordination service (etcd) with short-lived leases. Each instance claims a machine ID on startup, holds it for its lifetime, and releases it on shutdown. Sudden terminations are handled by TTL expiry.

Second, use a hash of instance-specific metadata — container ID, IP address tail, or cloud provider instance metadata — to derive a machine ID. This is deterministic and requires no coordination, but risks collision if the metadata space overlaps. Acceptable in practice with good hash design.

"What if two nodes are assigned the same machine ID by mistake?"

Every ID generated in the same millisecond by both nodes will have the same timestamp and machine ID bits. The only thing preventing collision is the sequence number — and since both nodes maintain independent sequence counters, they'll produce the same sequence numbers for concurrent requests.

The result: genuine duplicate IDs. This is why machine ID uniqueness is a hard requirement, not a soft one. The system must enforce it — either through a coordination service (etcd mutual exclusion) or through careful configuration management.


Quick Interview Checklist

  • ✅ Clarified requirements — numeric, 64-bit, sortable, no coordination per request
  • ✅ Explained why auto-increment fails — can't scale horizontally; each shard produces duplicates
  • ✅ Explained UUID limitations — 128 bits, random (not sortable), B-tree fragmentation
  • ✅ Snowflake structure: 1 unused | 41 timestamp | 10 machine ID | 12 sequence
  • ✅ Uniqueness proof — different timestamp, different machine ID, or different sequence = unique
  • ✅ 41-bit timestamp: ~69 years using custom epoch
  • ✅ 10-bit machine ID: 1,024 nodes; assignment via etcd, IP, or static config
  • ✅ 12-bit sequence: 4,096 IDs/ms/machine; 4.2B IDs/sec system-wide
  • ✅ Clock skew: wait-until-caught-up, raise error, or monotonic clock — trade-offs named
  • ✅ Generation algorithm: timestamp check → sequence increment → bit-shift assembly
  • ✅ Centralised service vs embedded library — embedded preferred for latency; batching for centralised
  • ✅ Real-world variants: Twitter, Discord, Instagram, LinkedIn — bit layout differences explained
  • ✅ k-sortability and B-tree performance — why it matters concretely
  • ✅ Machine ID leak on crash — etcd TTL-based lease recovery
  • ✅ 69-year overflow — custom epoch choice, planned migration

Conclusion

Designing a distributed ID generator is an exercise in understanding what guarantees you actually need and which approaches silently violate them.

Auto-increment looks simple but breaks at horizontal scale. UUIDs look simple but hurt database performance. Snowflake IDs solve both problems with a clever bit layout — but introduce their own subtlety around clock skew and machine ID management.

The candidates who stand out in this interview are the ones who don't just name Snowflake. They explain why timestamp goes in the most-significant bits (sortability and B-tree performance), why machine ID must be unique and how you enforce that at scale, and what the system does when a clock ticks backward.

The design pillars:

  1. Snowflake's 64-bit structure — timestamp | machine ID | sequence; guarantees uniqueness without coordination in every scenario
  2. Custom epoch — shifts the 69-year lifespan to start from when the system was built, not 1970
  3. Machine ID assignment via etcd — dynamic, TTL-based lease; returns IDs to the pool when nodes die
  4. Clock skew handling — refuse and wait for small corrections; fatal error for large corrections in sensitive systems
  5. Embedded library over centralised service — microsecond latency, no network dependency; coordination only for machine ID assignment at startup
  6. k-sortability — timestamp-prefixed IDs keep B-tree indexes dense and cache-friendly; the concrete reason sortability matters

Frequently Asked Questions

What is a distributed ID generator?

A distributed ID generator creates unique identifiers across multiple independent servers with no inter-server coordination at request time.

Unlike a single database auto-increment, a distributed generator must ensure no two servers ever produce the same ID — even when operating completely in isolation, with no shared state between them.

The core challenge:

  1. Servers cannot pause to ask a central authority "what's the next ID?" — that creates a bottleneck
  2. Servers cannot coordinate with each other per request — that adds latency and coupling
  3. IDs must still be globally unique, ideally time-ordered, and compact enough to be used as database primary keys

What is a Snowflake ID and how does it work?

A Snowflake ID is a 64-bit integer that encodes three pieces of information — a timestamp, a machine identifier, and a sequence number — into a single compact value that is globally unique and time-sortable.

The 64-bit structure:

plaintext
| 1 bit  | 41 bits        | 10 bits    | 12 bits         |
|--------|----------------|------------|-----------------|
| unused | timestamp (ms) | machine ID | sequence number |

Why this structure guarantees uniqueness — three scenarios, no overlap:

  1. Different milliseconds → timestamp bits differ → IDs are unique, no coordination needed
  2. Same millisecond, different machine → machine ID bits differ → IDs are unique
  3. Same millisecond, same machine → sequence number increments → IDs are unique (up to 4,096 per ms)

Twitter developed the algorithm in 2010 to replace auto-incrementing integer IDs that couldn't scale horizontally across database shards.


Why are UUIDs bad for database primary keys at scale?

UUIDs are globally unique without coordination — but random UUIDs used as primary keys cause severe B-tree index fragmentation that degrades database performance at scale.

How the fragmentation happens:

  1. Databases store B-tree indexes in sorted key order
  2. A random UUID has no relationship to insertion order — each insert lands at a random position in the B-tree
  3. Index pages become half-empty over time, scattered across memory
  4. The database must read and write more pages per query — write amplification increases
  5. Cache efficiency drops because recently inserted records are not co-located

How Snowflake IDs fix this:

  1. The timestamp is the most significant component — newer IDs are always numerically larger
  2. Every insert appends to the end of the B-tree
  3. Pages fill completely before a new page is allocated
  4. The index stays dense and cache-friendly

Discord measured this: switching from UUIDs to Snowflake IDs for message IDs substantially reduced database write amplification. For tables with billions of rows, the difference is significant.


What is clock skew and why is it dangerous for Snowflake IDs?

Clock skew is the divergence between a server's reported time and actual time. NTP periodically corrects server clocks — and if a clock is corrected backward, a Snowflake generator can produce a duplicate ID.

The failure scenario:

  1. Generator produces ID with timestamp T=500
  2. NTP corrects the clock backward to T=498
  3. Generator produces another ID with timestamp T=498
  4. Same timestamp + same machine ID + same sequence number = identical 64-bit value — a duplicate

The three standard mitigation strategies:

StrategyMechanismBest for
Refuse and waitPause generation until clock catches up to last_timestampMost systems — NTP corrections are milliseconds
Raise an errorFatal error + restart; load balancer routes to healthy nodesFinancial systems where duplicate IDs are catastrophic
Monotonic clockUse OS monotonic clock (never goes backward) for elapsed time measurementEnvironments with frequent clock corrections

Practical guidance: NTP corrections are typically small — milliseconds, not seconds. The "refuse and wait" approach introduces a pause proportional to the correction magnitude, which is usually imperceptible. Option 1 is the right default.


How do you assign machine IDs in a distributed Snowflake system?

Machine ID assignment is the operational challenge that most Snowflake implementations underestimate. Each of the 1,024 possible machine IDs must be held by exactly one node at any time — if two nodes share an ID, duplicate IDs are produced.

The three approaches compared:

ApproachHow it worksProsCons
Dynamic via etcd/ZooKeeperNode claims ID from a pool on startup; holds it via TTL lease; releases on shutdownProduction-correct; handles crashes via TTL expiryRequires etcd/ZooKeeper infrastructure
Derived from IP addressTake last 10 bits of node IP (ip & 0x3FF)Zero infrastructure dependencyBreaks if two nodes share the same last 10 IP bits
Static configurationFixed ID written in environment variables per nodeSimple, no runtime dependencyOperationally fragile at scale; error-prone when nodes are added

How TTL-based leases handle crashes:

  1. Node claims machine ID 42 in etcd with a 60-second TTL
  2. Node renews the lease every 10 seconds while alive
  3. Node crashes — renewal stops
  4. TTL expires after 60 seconds — machine ID 42 returns to the pool
  5. A new node claims ID 42 — no permanent leak

Recommended approach: etcd dynamic assignment for production. IP-derived for simpler deployments where the network is stable and well-structured.


What is k-sortability and why does it matter for IDs?

K-sortability means IDs are roughly sorted by creation time — not perfectly ordered, but close enough that database inserts behave similarly to sequential IDs.

Snowflake IDs are k-sortable because the timestamp occupies the most significant bits: newer IDs are always numerically larger than older IDs generated more than 1 millisecond apart. IDs within the same millisecond are not ordered relative to each other — hence "roughly" sorted.

Why it matters in practice:

  1. B-tree insert performance — k-sortable IDs consistently append near the end of the B-tree, keeping index pages dense (as explained in the UUID section above)
  2. Range queries by timeWHERE id > snowflake_id_for_timestamp is a valid time-range query without a separate created_at column
  3. Pagination — Discord paginates through message history using Snowflake IDs directly: "show messages with ID less than X" is equivalent to "show messages before timestamp X"
  4. Debugging — given a Snowflake ID, extract the timestamp by right-shifting 22 bits and adding the epoch. Instant creation time, no database query needed

What is the difference between a centralised ID service and an embedded ID library?

A centralised ID service is a dedicated microservice that all other services call via RPC. An embedded library runs the Snowflake generator in-process within each service.

Centralised ID ServiceEmbedded Library
Latency1–5ms per ID (network round-trip)Microseconds (local CPU operation)
Machine ID managementSimple — small fixed fleet owns the IDsComplex — every service instance needs a unique ID
Availability dependencyAll services depend on the ID serviceNo dependency — each service is self-sufficient
ObservabilityCentralised — easy to audit all ID generationDistributed — harder to audit globally
Best forLower-throughput systems, compliance requirementsHigh-throughput systems, latency-sensitive paths

Optimising the centralised service:

If a centralised service is required, batch ID requests — request 100 IDs at once, cache them locally, and hand them out one by one. This amortises the network round-trip cost across 100 operations rather than paying it on every single ID request.


How do real companies implement Snowflake-style IDs differently?

The Snowflake structure is a template — different companies adapt the bit allocation to their specific needs and constraints.

CompanyBit layoutEpochNotable difference
Twitter1 unused | 41 timestamp | 10 machine | 12 sequenceNov 4, 2010The original — all others are variations
Discord1 unused | 41 timestamp | 10 worker | 12 sequenceJan 1, 2015Structurally identical; newer epoch extends lifespan
LinkedIn1 unused | 41 timestamp | 5 datacenter | 5 machine | 12 sequenceCustomDatacenter ID baked in — useful for routing and debugging
Instagram41 timestamp | 13 shard ID | 10 sequenceAug 2011Generated inside PostgreSQL via a stored function
Sonyflake39 timestamp (10ms) | 8 sequence | 16 machineJan 1, 201410ms resolution extends lifespan; 65,536 machine IDs

The consistent principle across all variants: timestamp in the most significant bits (for k-sortability), followed by a machine/shard identifier, followed by a sequence number. The specific bit counts are tuned to each company's scale and operational constraints.


What happens when the 41-bit timestamp overflows in 69 years?

The overflow date depends on the chosen epoch — not on the algorithm itself. The 69-year lifespan is measured from the epoch date, not from 1970.

How the math works:

  1. 2^41 milliseconds ≈ 69.7 years
  2. Twitter chose epoch November 4, 2010 → overflow around 2079
  3. Discord chose epoch January 1, 2015 → overflow around 2084
  4. A system built today (2025) with a 2025 epoch → overflow around 2094

What to do about it:

  1. Choose a recent epoch when building the system — this maximises the useful lifespan
  2. Document the epoch prominently — it is a system-level constant that every future engineer needs to know
  3. Plan a migration well before overflow — migrating to a new epoch requires updating all services that generate or parse IDs, but it is a planned upgrade, not an emergency
  4. Alternative: allocate more bits to the timestamp (e.g., 42 bits = ~139 years) by reducing machine ID or sequence bits — acceptable if the system has fewer nodes or lower per-node throughput

The 69-year window is intentionally long enough to be a future generation's problem — but short enough to be finite and plannable.


Which companies ask the distributed ID generator question in interviews?

Google, Meta, Amazon, Twitter, LinkedIn, Discord, and Microsoft ask variants of this question for software engineer and senior software engineer roles.

Why it appears frequently:

  1. Widely applicable — almost every large-scale system needs unique IDs; the question is relevant at any company handling significant data volume
  2. Depth is hidden — the surface answer (use Snowflake) takes 30 seconds; the depth (clock skew, machine ID management, B-tree fragmentation, k-sortability) fills a full 45-minute interview
  3. Tests first-principles reasoning — candidates who understand why each decision was made (not just what it is) are easily distinguished

What interviewers specifically listen for:

  1. UUID B-tree fragmentation — explaining this concretely, not just saying "UUIDs are bad"
  2. All three clock skew strategies — with the trade-offs for each, not just "wait for the clock"
  3. Machine ID assignment options — naming etcd TTL leases specifically, not just "use a config"
  4. k-sortability connected to B-tree performance — making the concrete link between the data structure and the engineering decision
  5. Centralised vs embedded trade-off — and the batching optimisation for centralised services

The distributed ID generator question is deceptively layered. Knowing the Snowflake structure is the starting point — explaining B-tree fragmentation, clock skew handling, and machine ID lease management is what the top-tier answers look like. If you want to practice walking through that depth under real interview pressure, with follow-up questions on overflow, machine ID leaks, and UUID comparisons, Mockingly.ai has system design simulations built for engineers preparing for senior roles at Google, Meta, Amazon, Twitter, and LinkedIn.

Companies That Ask This

Related System Design Guides