Google Interview Question

Design an E-Commerce Website — Google Interview

hard22 minBackend System Design

How Google Tests This

Google is known for asking some of the most challenging system design questions in the industry, covering distributed systems, data infrastructure, and large-scale web services. Their interviews emphasise designing systems that handle billions of users and petabytes of data.

Interview focus: Distributed systems, search infrastructure, real-time data processing, and global-scale services.

Key Topics
e commercemicroserviceselasticsearchrediskafkasaga patterndistributed systems

How to Design an E-Commerce Website

This question shows up at Amazon (constantly), Shopify, Flipkart, eBay, and basically any company that runs or builds commerce infrastructure. And it's one of those problems that looks easy in the first five minutes — products, cart, checkout, done — and then the interviewer asks "how do you prevent two users from buying the same last item simultaneously?" and the whole conversation shifts.

The depth in this question isn't in the happy path. It's in the hard parts: inventory consistency under high concurrency, payment failures in a distributed system, the checkout flow where five independent services need to all succeed or all roll back, and what happens to your database when Black Friday hits and traffic spikes 10× in 30 seconds.

This guide covers all of it, with the conversational back-and-forth of a real interview.


Step 1: Clarify the Scope

Interviewer: Design an e-commerce website.

Candidate: Before I jump in — a few questions. Are we designing a marketplace like Amazon with multiple third-party sellers, or a single-merchant storefront like Shopify? That changes inventory ownership significantly. What scale are we targeting — millions of users like Flipkart, or thousands like a growing startup? Do we need real-time inventory tracking across multiple warehouses? And are flash sales or time-limited deals in scope? Those require a very different concurrency strategy.

Interviewer: Assume a large-scale multi-merchant marketplace — think Amazon or eBay. Millions of users, thousands of sellers. Real-time inventory. Flash sales are in scope — that's actually a deep dive I want to get into. For now, focus on the core flows: browsing, search, cart, checkout, and order management.

Candidate: Great. Let me work through requirements and numbers, then walk through the architecture service by service — because this problem really is a microservices problem. Each domain has different consistency, scalability, and database needs.

One thing worth saying upfront: the scope clarification above isn't just politeness. "Flash sales in scope" just changed about 30% of the inventory design. Interviewers at Amazon specifically use this question to see whether candidates ask or just assume. If you want to practice building that instinct before the real thing, Mockingly.ai runs realistic system design simulations where you get feedback on exactly these kinds of early decisions.


Requirements

Functional

  • Product catalog: sellers can create, update, and delete listings with photos, descriptions, pricing, and inventory counts
  • Product search: users can search by keyword, filter by category/price/rating, and see autocomplete suggestions
  • Shopping cart: users can add, remove, and update item quantities; cart persists across sessions
  • Checkout: reserve inventory, process payment, create order — all transactionally safe
  • Order management: users can view order history and track shipment status
  • Flash sales: time-limited deals with limited stock, high-concurrency checkout
  • Reviews and ratings: buyers can rate purchased products

Non-Functional

  • High availability for reads — product browsing and search must stay up even during partial failures
  • Strong consistency for inventory — we cannot sell more stock than exists
  • Idempotent payments — a user must never be charged twice for the same order
  • Horizontal scalability — the system must handle 10× peak traffic (Black Friday, sales events)
  • Low search latency — product search results under 100ms

Back-of-the-Envelope Estimates

Interviewer: Give me the rough numbers.

Candidate: Let me work through a mid-to-large scale estimate.

plaintext
Registered users:          500 million
Daily Active Users (DAU):  50 million
Products in catalog:       500 million (Amazon has ~350M+ listings)
Average page views/DAU:    20
Orders per day:            5 million
Peak multiplier (Black Friday): 10×
 
Read QPS:
  50M users × 20 pages / 86,400s = ~11,600 req/sec (avg)
  Peak:                            ~116,000 req/sec
 
Write QPS (orders):
  5M orders/day / 86,400s ≈ 58 orders/sec (avg)
  Peak:                     ~580 orders/sec
 
Search QPS:
  Assume every 3rd page view is a search
  50M × 7 searches/day / 86,400s ≈ 4,000 searches/sec
 
Storage:
  Product record (~5 KB average): 500M × 5 KB = 2.5 TB
  Product images (CDN-served):    ~100 TB (multiple sizes, thumbnails)
  Orders (1 KB per order):        5M/day × 365 × 1 KB ≈ 1.8 TB/year

Two conclusions from these numbers: reads dramatically outnumber writes — this is a read-heavy system that needs aggressive caching. And inventory writes during peak are the critical path — 580 order attempts per second on Black Friday, each touching inventory counters, is where consistency challenges live.


High-Level Architecture: Why Microservices

Interviewer: Why microservices and not a monolith?

Candidate: Because different parts of this system have fundamentally different scaling and consistency requirements. The product search service needs to handle 4,000 QPS and can tolerate slightly stale data — it scales horizontally and is read-only. The payment service handles 58 QPS and requires strong ACID guarantees — it's write-critical and consistency-first. The cart service is stateless per user and benefits from an in-memory store. Packaging all of this into a monolith means scaling everything together when only search is under load.

More importantly, failures should be isolated. If the reviews service goes down, checkout should still work. If the recommendation engine is slow, the product listing page should still render. Microservices give you independent failure domains.

plaintext
                 ┌──────────────────────────────────┐
                 │           API Gateway             │
                 │  (auth, rate limiting, routing)   │
                 └──────────────┬───────────────────┘

   ┌────────────┬───────────────┼──────────────┬────────────────┐
   │            │               │              │                │
┌──▼───┐  ┌────▼────┐  ┌───────▼──────┐  ┌───▼────┐  ┌───────▼──────┐
│Product│  │ Search  │  │    Cart      │  │ Order  │  │  Payment     │
│Service│  │Service  │  │   Service    │  │Service │  │  Service     │
│       │  │(ES)     │  │  (Redis)     │  │        │  │              │
└──┬───┘  └─────────┘  └──────────────┘  └───┬────┘  └──────────────┘
   │                                          │
   │         ┌────────────────────────────────┼─────────────┐
   │         │                                │             │
┌──▼──────┐  │                        ┌───────▼──────┐  ┌───▼──────────┐
│ Product │  │                        │  Inventory   │  │  Notification│
│  DB     │  │                        │   Service    │  │   Service    │
│(Postgres│  │                        │  (Postgres + │  │  (Kafka +    │
│/MySQL)  │  │                        │   Redis)     │  │   FCM/Email) │
└─────────┘  │                        └──────────────┘  └──────────────┘

    ┌────────▼───────────────────────┐
    │         Event Bus (Kafka)      │
    │  order.created, payment.done,  │
    │  inventory.updated, etc.       │
    └────────────────────────────────┘

Service 1: Product Catalog

The product catalog stores everything about a product — title, description, price, images, seller info, category, attributes. This is the most read-heavy service in the system.

What it is: a read-heavy data store for semi-structured product information. A product has attributes that vary enormously by category — a phone has screen size and battery capacity, a shirt has size and colour. This semi-structured nature means a flexible schema is often better than a rigid relational one.

Interviewer: What database would you use for product catalog data?

Candidate: A hybrid. For structured, transactional product data — product IDs, seller IDs, prices, stock counts — PostgreSQL. It handles ACID guarantees well and product creation/updates are relatively low frequency.

For product attributes that vary wildly by category, a document store like MongoDB fits better — a phone document has different fields than a shirt document, and a rigid relational schema with dozens of nullable columns is messy.

For images, we never store binary data in the database. Product images go to S3 (or GCS), served through a CDN. The product record stores only the image URL, not the bytes.

Interviewer: How do you handle the product images specifically?

Candidate: The seller uploads a raw image to the Product Service. The service stores it in S3 and immediately publishes an image.uploaded event to Kafka. An asynchronous Image Processing Worker consumes that event and generates multiple resolutions: thumbnail (200×200), standard (800×800), zoom (2000×2000). These processed versions are stored back in S3 and fronted by CloudFront or another CDN. The product record is updated with URLs for all three sizes.

The user's browser requests the thumbnail for the listing grid and the standard image for the detail page. The CDN handles worldwide delivery with low latency. The Product Service never touches image bytes again after initial upload.


Search is arguably the most critical feature in e-commerce. If users can't find what they want, they don't buy it.

What Elasticsearch does: Elasticsearch is a distributed, document-oriented search engine built on Apache Lucene. It maintains an inverted index — for every unique word in every product title and description, it stores a list of product IDs that contain that word. This makes full-text search across hundreds of millions of products sub-100ms. Standard relational databases cannot do this with a LIKE '%keyword%' query at any reasonable scale.

Interviewer: Walk me through how search works in your design.

Candidate: The product catalog is the source of truth. Elasticsearch is a read-optimised replica for search. When a product is created or updated in the Product DB, a Kafka event (product.upserted) is published. An Elasticsearch indexer worker consumes this event and writes the product document to the ES index.

When a user searches "wireless headphones under $100":

plaintext
1. Client → API Gateway → Search Service
2. Search Service sends query to Elasticsearch:
   {
     query: { bool: {
       must: { match: { title: "wireless headphones" } },
       filter: { range: { price: { lte: 100 } } }
     }},
     sort: [{ rating: "desc" }, { relevance_score: "desc" }]
   }
3. ES returns matching product IDs + scores
4. Search Service fetches lightweight product metadata for those IDs
   (from a Redis cache or the Product DB replica)
5. Returns ranked results with titles, prices, thumbnails

Interviewer: What about autocomplete — the suggestions that appear as you type?

Candidate: Autocomplete is separate from full search. We don't want to run a full Elasticsearch query on every keystroke. Instead, we use ES's completion suggester, which maintains a separate in-memory data structure optimised for prefix lookups. As the user types "wirel", the suggester returns "wireless headphones", "wireless earbuds", "wireless keyboard" in under 5ms.

Autocomplete suggestions are also cached aggressively in Redis. The top 1,000 most common search prefixes stay in Redis with a 1-hour TTL. Most keystrokes hit the cache, not Elasticsearch at all.


Service 3: Shopping Cart

The shopping cart is stateful per user but ephemeral — it doesn't need the durability guarantees of an order. It changes frequently (add item, remove item, update quantity) and needs to be fast.

Interviewer: How would you design the shopping cart?

Candidate: Redis. The cart is a hash in Redis, keyed by user ID:

plaintext
cart:{user_id} → {
    "product_id_123": { qty: 2, price: 49.99, title: "Wireless Headphones" },
    "product_id_456": { qty: 1, price: 12.99, title: "USB-C Cable" }
}

Every add, remove, or quantity change is an atomic HSET or HDEL operation. Redis handles thousands of these per second without breaking a sweat. The cart TTL is 30 days — if a user doesn't come back for a month, their cart expires.

Interviewer: Why not a database for the cart?

Candidate: Cart operations happen on every product page interaction — add to cart, update quantity. At 50 million DAU with frequent cart touches, that's potentially hundreds of thousands of writes per second. A relational database would handle this, but you'd be burning expensive, slow-to-scale write capacity on ephemeral data. Redis gives you sub-millisecond writes at 10× lower cost and infrastructure complexity.

Interviewer: What happens if a user logs in on a new device after shopping on their phone?

Candidate: Cart merge. When the user logs in, the client sends both the local (anonymous) cart and the authenticated user's server-side cart to the Cart Service. The service merges them — summing quantities for duplicate items, adding new items. The merged cart is written back to Redis under the authenticated user's ID. The anonymous cart is deleted.


Service 4: Inventory Management

This is where the real distributed systems depth lives in an e-commerce interview.

The problem: two users simultaneously try to buy the last unit of a product. Without coordination, both reads see stock = 1, both pass the "is available" check, both decrement — and you've sold -1 units. This is overselling, and in a distributed system with multiple application servers, it happens constantly without proper handling.

Interviewer: How do you prevent overselling? Walk me through the options.

Candidate: There are three main approaches, each with different trade-offs.

Option 1: Pessimistic locking (SELECT FOR UPDATE)

The inventory check and decrement are wrapped in a database transaction with a row-level lock:

sql
BEGIN;
SELECT stock FROM inventory WHERE product_id = 123 FOR UPDATE;
-- Only one transaction holds this lock at a time
-- Others wait
IF stock > 0:
    UPDATE inventory SET stock = stock - 1 WHERE product_id = 123;
    COMMIT;
ELSE:
    ROLLBACK;

This is correct — only one transaction touches the row at a time. But it serialises all purchases of a popular product. Under high concurrency, threads queue up waiting for the lock. For a normal product with moderate traffic, this works fine. For a popular item during a flash sale with thousands of concurrent buyers, the queue becomes the bottleneck and database connection pools exhaust.

Option 2: Optimistic locking with version numbers

No lock is held during the read. A version column detects conflicting updates:

sql
-- Read
SELECT stock, version FROM inventory WHERE product_id = 123;
-- stock = 5, version = 42
 
-- Write (atomic check)
UPDATE inventory
SET stock = stock - 1, version = version + 1
WHERE product_id = 123 AND version = 42;
-- If 0 rows affected: another request updated first, retry

This is non-blocking — reads and writes don't wait for each other. Under low-to-moderate concurrency, it's efficient. Under extreme concurrency (flash sales), many updates fail and retry, burning CPU on conflict resolution. Statistically, if K threads all compete for one item, only one wins each round and the rest retry — at very high K, this degrades badly.

Option 3: Redis atomic decrement (best for flash sales)

Pre-load inventory counts into Redis. Use Redis's atomic DECR command — since Redis is single-threaded, there is no race:

plaintext
# Pre-load: SET inventory:product_123 1000
# On purchase attempt:
result = DECR inventory:product_123
if result < 0:
    INCR inventory:product_123  # restore
    return "out of stock"
else:
    # Enqueue order for async processing
    RPUSH orders:pending { user_id, product_id, timestamp }
    return "order placed"

The database is only updated asynchronously by a worker that drains the orders queue. The hot path is entirely in Redis — microsecond latency, no database contention.

My recommendation: optimistic locking for standard purchases (it's simple and works for most products). Redis pre-decrement for flash sales and high-demand products. The inventory service detects which mode to use based on current request rate per product.


Service 5: Checkout and the Order Flow

Checkout is the most complex flow in the system. It spans multiple services — inventory, payment, order creation, notification — and all of them must succeed or all must be compensated. This is where the Saga pattern becomes essential.

Why not a database transaction: in a microservices architecture, the inventory database and the payment database are separate systems. You cannot wrap them in a single ACID transaction. A two-phase commit (2PC) would work technically, but it requires distributed locking across services that kills performance and creates tight coupling. The right pattern is the Saga.

What the Saga pattern is: a Saga is a sequence of local transactions where each step publishes an event that triggers the next step. If any step fails, compensating transactions undo the previous steps. It accepts eventual consistency in exchange for availability and autonomy of each service.

Interviewer: Walk me through the complete checkout flow when a user clicks "Place Order."

Candidate: Here's the orchestration-based Saga for checkout — I prefer orchestration over choreography here because checkout is a strict sequential workflow and a central coordinator gives us clear visibility into where a failed order got stuck.

plaintext
1. User clicks "Place Order"
   → Client sends POST /checkout with cart contents
 
2. Order Service creates order in PENDING state
   → Writes to PostgreSQL: orders table, status = PENDING
   → Returns order_id to client (fast response — user sees "Processing...")
 
3. Saga Coordinator begins:
 
   Step A: Reserve Inventory
   → Inventory Service: decrease stock by ordered quantity
   → On success: publish inventory.reserved event
   → On failure (out of stock): saga ends, order → FAILED, notify user
 
   Step B: Process Payment
   → Payment Service: charge the customer's card
   → Payment request carries idempotency_key = order_id
     (prevents double-charging on retry)
   → On success: publish payment.captured event
   → On failure (card declined):
         Compensate Step A: release reserved inventory
         Order → PAYMENT_FAILED, notify user
 
   Step C: Confirm Order
   → Order Service: update order status → CONFIRMED
   → Publish order.confirmed event
 
   Step D: Trigger Fulfillment
   → Fulfillment Service: assign to warehouse, schedule shipping
   → Notification Service: send "Your order is confirmed!" email/push
 
4. Final state: order is CONFIRMED or FAILED with a clear reason

Interviewer: What if the Order Service crashes after Step B but before Step C — payment was captured but the order never confirmed?

Candidate: This is the hardest failure case and it's exactly why idempotency matters. The Saga Coordinator writes its state to a durable store (the Order DB) after each step. When the Order Service restarts, it reads the saga state and resumes from where it left off. The payment is already captured — the coordinator re-tries Step C. Since each step is idempotent (confirming an already-confirmed order is a no-op), re-running it doesn't cause double effects.

The alternative — not persisting saga state — means a restart loses track of in-progress sagas. Customers get charged but their orders never confirm. That's unacceptable.

The Saga pattern is one of those topics that reads clearly on paper but becomes genuinely tricky to explain under interview pressure — especially when an interviewer asks "what happens if step C crashes after step B?" mid-explanation. Worth rehearsing out loud before your interview. Mockingly.ai has this question in its simulation library if you want a realistic test run.


Payment Idempotency: Never Charge Twice

Interviewer: How do you ensure a customer is never charged twice, even if there are network timeouts and retries?

Candidate: Every payment request carries an idempotency_key — a unique identifier for this specific payment attempt. In our case, the order_id serves as the idempotency key.

The Payment Service logic:

plaintext
1. Receive payment request with { order_id, amount, card_token }
2. Check Redis: GET payment_status:{order_id}
   → If "captured": return the cached successful response (duplicate request)
   → If "processing": return 202 Accepted (previous attempt in flight)
   → If nil: proceed
3. SET payment_status:{order_id} "processing" EX 300
   (5-minute window for the payment to complete)
4. Send charge request to payment provider (Stripe, Adyen, etc.)
5. On success:
   → SET payment_status:{order_id} "captured" (no TTL — permanent)
   → Store transaction details in PostgreSQL
   → Return success
6. On failure:
   → DELETE payment_status:{order_id} (allow retry)
   → Return error with retry guidance

If the client times out after step 4 and retries the request, the duplicate hits step 2 and returns the cached "captured" response. The payment provider never sees the second request. The customer is charged exactly once.

This is what Stripe calls idempotency keys in their API — every Stripe payment request should include a unique key so that network retries don't result in duplicate charges.


Flash Sales: High-Concurrency at Scale

Flash sales are where e-commerce systems break. Imagine 50,000 users simultaneously trying to buy 100 units of a limited-edition sneaker the moment it drops.

Interviewer: It's 12:00:00 PM. A flash sale starts. 50,000 users hit "Buy Now" simultaneously for 100 available units. How does your system handle this?

Candidate: This is a fundamentally different traffic pattern from normal e-commerce — it's a coordinated spike, not organic traffic. The approach has three layers.

Layer 1: Traffic shaping before it reaches the inventory service

A queue-based virtual waiting room. When the flash sale starts, instead of letting all 50,000 requests hit the inventory service simultaneously, they enter a queue. Users see a "You're in queue — estimated wait: 45 seconds" page. This prevents the thundering herd.

Alternatively (and simpler at implementation level): only process the first N requests per second at the inventory tier, and return "sold out" to everyone else above the threshold.

Layer 2: Redis pre-decrement for inventory

Before the sale starts, pre-load inventory into Redis: SET flash_inventory:product_123 100. When a request is processed, atomically decrement:

plaintext
count = DECR flash_inventory:product_123
if count < 0:
    INCR flash_inventory:product_123  # restore the count
    return "sold out"
else:
    enqueue_order_for_async_processing(user_id, product_id)
    return "order placed — processing"

Redis's single-threaded architecture guarantees the DECR is atomic. No two requests see the same value. Exactly 100 orders succeed; request 101 sees count = -1 and is turned away. No distributed lock needed.

Layer 3: Asynchronous order creation

The 100 winning requests get an immediate "order placed" response. Their orders are written to a Kafka queue. A pool of Order Workers drains the queue, runs the full Saga (payment capture, inventory confirmation, order creation) asynchronously. Users get the success email within seconds.

This decouples the user-facing response time from the database write time. The user isn't waiting for all five Saga steps to complete before seeing "Order placed."

Interviewer: What if the async order processing fails for one of those 100 winners — say their card is declined?

Candidate: Their order fails at the payment step. The Saga compensation releases the reserved inventory back — both in Redis and in the database. The system sends them a "Payment failed — item released back to inventory" notification. Another buyer could potentially get that unit if there's a retry pool. This is the correct business behaviour — a reserved unit that can't be paid for should be released.

The flash sale section is one where interviewers at Amazon specifically probe whether you've thought through the thundering herd, the Redis atomicity guarantee, and the compensation path when a winner's card declines. Having clean answers to all three in sequence is exactly what Mockingly.ai is designed to help you build — with simulations that push on each layer in turn.


Database Design

Interviewer: Walk me through the key database schemas.

Candidate: Each service owns its own database — no service shares a database with another. Here are the critical tables.

Products (PostgreSQL):

sql
CREATE TABLE products (
    product_id    UUID PRIMARY KEY,
    seller_id     UUID NOT NULL,
    title         TEXT NOT NULL,
    description   TEXT,
    category_id   INT NOT NULL,
    base_price    DECIMAL(10,2) NOT NULL,
    status        TEXT DEFAULT 'active',  -- active, inactive, deleted
    created_at    TIMESTAMP DEFAULT NOW(),
    updated_at    TIMESTAMP DEFAULT NOW()
);

Inventory (PostgreSQL):

sql
CREATE TABLE inventory (
    product_id    UUID PRIMARY KEY REFERENCES products(product_id),
    warehouse_id  UUID NOT NULL,
    stock         INT NOT NULL DEFAULT 0 CHECK (stock >= 0),
    version       INT NOT NULL DEFAULT 1,  -- for optimistic locking
    reserved      INT NOT NULL DEFAULT 0,  -- held in carts/pending orders
    updated_at    TIMESTAMP DEFAULT NOW()
);
-- The CHECK constraint enforces stock can never go negative at the DB level
-- as a safety net beyond application-level checks

Orders (PostgreSQL):

sql
CREATE TABLE orders (
    order_id      UUID PRIMARY KEY,
    user_id       UUID NOT NULL,
    status        TEXT NOT NULL,  -- pending, confirmed, shipped, delivered, cancelled
    total_amount  DECIMAL(10,2),
    created_at    TIMESTAMP DEFAULT NOW(),
    updated_at    TIMESTAMP DEFAULT NOW()
);
 
CREATE TABLE order_items (
    order_id      UUID REFERENCES orders(order_id),
    product_id    UUID NOT NULL,
    quantity      INT NOT NULL,
    unit_price    DECIMAL(10,2) NOT NULL,
    PRIMARY KEY   (order_id, product_id)
);

The CHECK (stock >= 0) constraint on the inventory table is a last-resort safety net at the database level. Application logic should catch negative stock before it reaches the DB, but database constraints are the final guard.


Caching Strategy

Interviewer: How do you handle the read load at 116,000 req/sec peak?

Candidate: Aggressive, layered caching.

CDN for static assets: product images, CSS, JS — all served from a CDN like CloudFront. None of this hits application servers.

Redis for product metadata: product details (title, price, thumbnail URL) are cached in Redis with a 5-minute TTL. A product detail page load goes Redis → cache hit → 2ms response, rather than hitting PostgreSQL.

The cache invalidation strategy: when a seller updates their product price, we write to PostgreSQL and immediately delete the Redis cache entry (not update — delete). The next request finds a cache miss, fetches fresh from the DB, and repopulates the cache. Write-through caching (updating cache on every DB write) works too, but is more complex.

What we don't cache: inventory counts. Showing a user a stale "In Stock" when the item is actually sold out is a bad experience — they go through checkout to find out it's gone. Inventory is always read from the source of truth (Redis pre-decrement cache or the DB directly for non-flash items).


Order Tracking and Notifications

Interviewer: How does the user know their order status in real time?

Candidate: Kafka-driven event sourcing. Every status change in the Order Service publishes an event: order.confirmed, order.shipped, order.out_for_delivery, order.delivered. The Notification Service consumes these and sends the appropriate push notification, SMS, or email.

For real-time order tracking on the order detail page, the client can use polling (GET /orders/ every 30 seconds — simple and works) or a more sophisticated WebSocket connection for live updates. For an e-commerce platform, polling is usually sufficient — order status changes infrequently enough that a 30-second polling interval feels live to the user.


Common Interview Follow-ups

"How would you design the recommendation engine — 'customers who bought this also bought…'?"

At a high level: store user purchase and browsing history in a data warehouse (BigQuery or Redshift). Run a collaborative filtering model periodically (batch job, every few hours) to compute product similarity and user-product affinity scores. Store pre-computed recommendations in Redis: recommendations:{user_id} → [product_id_1, product_id_2, ...]. The product detail page reads from this Redis key — sub-millisecond lookup. Real-time personalisation (updating recommendations immediately after a purchase) is a harder problem, typically solved with a streaming ML pipeline on Kafka. For the interview, describe the batch approach and mention the streaming upgrade as a follow-up.

"How do you handle a seller that tries to update inventory to a negative number, or a buyer that tries to add 99,999 units to cart?"

Input validation at the API Gateway layer — reject any request that violates business rules before it reaches the services. Cart item quantity is capped at a reasonable limit (e.g., 99). Inventory updates require the seller to provide a positive integer. The CHECK (stock >= 0) constraint at the database level is a final backstop but shouldn't be the first line of defence.

"What if the payment provider (Stripe) is down during checkout?"

Return a user-facing error: "Payment could not be processed. Please try again." The Saga aborts at Step B, releases reserved inventory, and sets the order to PAYMENT_FAILED. The order is preserved in the database so the user can retry payment later without restarting from scratch — the system shows them a "Complete your payment" option. Behind the scenes, implement a circuit breaker on the payment provider client: if X% of calls fail within a window, stop sending requests to Stripe and return a failure immediately rather than waiting for timeouts.

"How would you design the seller analytics dashboard — views per product, conversion rate, revenue by day?"

Real-time analytics at the scale of millions of sellers is an OLAP problem, not OLTP. Don't serve analytics from the Orders/Products PostgreSQL. Events (order.confirmed, product.viewed) stream into Kafka and are consumed by a Kafka → ClickHouse (or BigQuery) writer. ClickHouse is a column-oriented analytical database that can query billions of rows in seconds. The seller dashboard queries ClickHouse, not PostgreSQL. The analytics data is eventually consistent (minutes behind real-time) — acceptable for a daily metrics view.

"How does your search handle typos? 'wirless headphnoes' should still return results."

Elasticsearch's fuzzy query computes the edit distance between the search term and indexed terms. A fuzziness: "AUTO" setting allows 1-2 character edits for terms of a certain length. "wirless" matches "wireless" with 1 edit. For the search results to be good, we also index common misspellings and synonyms in the ES mapping. This is a configuration concern, not an architecture concern — worth mentioning as a detail.


Quick Interview Checklist

  • ✅ Clarified scope — marketplace vs single-merchant, flash sales in scope
  • ✅ Back-of-the-envelope — read-heavy (116K req/sec peak), writes concentrated at checkout (580/sec)
  • ✅ Microservices justified — different scaling and consistency requirements per domain
  • ✅ Product catalog — PostgreSQL for structured data, document store for variable attributes, S3 + CDN for images
  • ✅ Search with Elasticsearch — inverted index, ranked results, autocomplete via completion suggester
  • ✅ Search result caching in Redis for hot prefixes
  • ✅ Shopping cart in Redis — hash per user, TTL, cart merge on login
  • ✅ Inventory: optimistic locking for normal purchases, Redis pre-decrement for flash sales
  • CHECK (stock >= 0) as DB-level safety net
  • ✅ Checkout via Saga pattern — orchestration-based, local transactions + compensations
  • ✅ Saga state persisted to DB — resumes on crash
  • ✅ Payment idempotency via idempotency_key in Redis — never charge twice
  • ✅ Flash sale: queue + Redis atomic DECR + async Saga processing
  • ✅ Kafka-driven order status updates → Notification Service
  • ✅ CDN for static assets, Redis for product metadata, no caching for inventory
  • ✅ Analytics data in ClickHouse/BigQuery — separate from OLTP databases

Conclusion

Designing an e-commerce system is one of the best interview questions out there because it forces you to reason about different consistency models for different parts of the same system. Payment needs ACID-level guarantees. Search can tolerate minutes of staleness. Inventory needs strong consistency but not global transactions. Cart just needs to be fast.

The candidates who do best in these interviews — whether at Amazon, Shopify, or Flipkart — are the ones who can articulate why each service has the database and caching strategy it does, not just name the technologies.

The design pillars:

  1. Microservices with clear ownership — each service owns its own database; different services have different consistency needs
  2. Elasticsearch for search — the only viable approach for full-text search across hundreds of millions of products
  3. Redis for the cart — fast, ephemeral, sub-millisecond; not worth burning relational DB writes on
  4. Optimistic locking for normal purchases, Redis DECR for flash sales — match the concurrency strategy to the traffic pattern
  5. Saga pattern for checkout — distributed transactions without 2PC; each step has a compensating transaction
  6. Payment idempotency keys — the idempotency_key in Redis is what stands between you and double-charged customers
  7. Async Saga execution for flash sales — return "order placed" immediately, process the Saga in the background


Frequently Asked Questions

How do you prevent overselling in an e-commerce system?

Overselling happens when two users simultaneously buy the last unit — both reads see stock = 1, both pass the availability check, and both decrement — resulting in stock = -1. Three strategies prevent this, each suited to different traffic patterns.

Option 1: Pessimistic locking (SELECT FOR UPDATE)

sql
BEGIN;
SELECT stock FROM inventory WHERE product_id = 123 FOR UPDATE;
-- Only one transaction holds this row lock at a time
IF stock > 0:
    UPDATE inventory SET stock = stock - 1 WHERE product_id = 123;
    COMMIT;
ELSE: ROLLBACK;

Correct — but serialises all purchases of that product. Causes lock contention under high concurrency.

Option 2: Optimistic locking with a version column

Read the version, attempt a conditional update:

sql
UPDATE inventory SET stock = stock - 1, version = version + 1
WHERE product_id = 123 AND version = :last_read_version;
-- If 0 rows affected: conflict, retry

Non-blocking for reads. Efficient at moderate concurrency. Degrades under flash-sale-level load when many retries accumulate.

Option 3: Redis atomic DECR (for flash sales)

  1. Pre-load stock into Redis: SET inventory:product_123 1000
  2. On each purchase: count = DECR inventory:product_123
  3. If count < 0: restore with INCR and return "sold out"
  4. If count >= 0: enqueue order for async processing

Redis is single-threaded — DECR is atomic with no race condition. The database is updated asynchronously. Zero contention at any concurrency level.

Recommended approach: optimistic locking for normal products, Redis DECR for flash sales and high-demand items.


What is the Saga pattern and why is it used for e-commerce checkout?

The Saga pattern is a sequence of local transactions where each step publishes an event triggering the next step. If any step fails, compensating transactions undo all previous steps.

Why it is needed for checkout:

  1. E-commerce checkout spans multiple independent services — Inventory, Payment, Order, Fulfillment
  2. These services have separate databases — you cannot wrap them in a single ACID transaction
  3. Two-phase commit (2PC) would work but creates distributed locking that kills performance and tight coupling
  4. The Saga accepts eventual consistency in exchange for service autonomy

The orchestration-based checkout Saga:

plaintext
Step A: Reserve Inventory  →  On failure: end, notify user
Step B: Capture Payment    →  On failure: release inventory (compensate A)
Step C: Confirm Order      →  On failure: refund payment (compensate B), release inventory
Step D: Trigger Fulfillment

Why orchestration over choreography:

An orchestrator (a central coordinator) gives visibility into exactly which step a failed order got stuck on. Choreography (services reacting to events independently) is harder to debug and reason about when an order is stranded mid-process.

Critical requirement: the Saga Coordinator must persist its state after each step. If the coordinator crashes between Step B (payment captured) and Step C (order confirmed), it must resume from Step C on restart — not re-run Step B (which would charge the customer twice).


Why use Redis for a shopping cart instead of a database?

Redis is the correct tool for shopping carts because carts are ephemeral, high-frequency, and don't need relational guarantees.

Why Redis outperforms a relational database for carts:

  1. Write frequency — cart touches happen on every product interaction. At 50M DAU with multiple cart operations per session, that's potentially millions of writes per second — expensive on a relational DB
  2. Ephemeral data — carts don't need the durability of orders. A cart that expires after 30 days of inactivity is acceptable; an order that expires is not
  3. Sub-millisecond latency — Redis hash operations (HSET, HDEL, HGETALL) complete in under 1ms. A relational DB adds 5–20ms per operation due to disk I/O and query parsing
  4. Simple data model — a cart is a hash: cart:{user_id} → { product_id: {qty, price, title} }. No joins, no foreign keys

Cart merge on login:

When an anonymous user logs in, the client sends both the local cart and the server-side cart to the Cart Service. The service merges them (summing quantities for duplicates, adding new items), writes the merged cart to Redis under the authenticated user's ID, and deletes the anonymous cart.


How does product search work with Elasticsearch in an e-commerce system?

Elasticsearch maintains an inverted index — for every unique word across all product titles and descriptions, it stores a list of matching product IDs. This makes full-text search across hundreds of millions of products return results in under 100ms. A LIKE '%keyword%' query on PostgreSQL cannot achieve this at any meaningful scale.

The sync pipeline:

  1. Product is created or updated in the Product DB (PostgreSQL/MongoDB)
  2. A Kafka event (product.upserted) is published
  3. An Elasticsearch indexer worker consumes the event and writes the product document to the ES index
  4. Search queries go to Elasticsearch; the Product DB is never involved in search

Autocomplete is handled separately via ES's completion suggester — a prefix-optimised in-memory data structure that returns suggestions in under 5ms. The top 1,000 most common search prefixes are cached in Redis to avoid hitting ES on every keystroke.

Fuzzy search for typos:

Setting fuzziness: "AUTO" allows 1–2 character edits based on term length. "wirless" matches "wireless" with 1 edit. Common misspellings and synonyms can also be indexed directly in the ES mapping for higher-quality results.


How does payment idempotency work and why is it critical?

Payment idempotency ensures a customer is never charged twice, even when network timeouts cause the client to retry the same request.

The problem without idempotency:

  1. User clicks "Place Order" — request reaches the payment provider
  2. Provider charges the card — but the response times out before reaching the server
  3. The server retries the request
  4. The provider charges the card again — user is double-charged

How idempotency keys solve it:

  1. Each payment request carries an idempotency_key = the order_id (unique per order, forever)
  2. The Payment Service checks Redis before processing: GET payment_status:{order_id}
  3. If "captured" → return the cached success response (duplicate request, already charged)
  4. If "processing" → return 202 Accepted (previous attempt still in flight)
  5. If nil → set "processing" in Redis, then charge the card
  6. On success → set "captured" permanently; on failure → delete key to allow retry

The payment provider never sees the second request. The customer is charged exactly once regardless of how many retries occur.


How does a flash sale system work — 50,000 users for 100 items?

Flash sales require a fundamentally different architecture from normal e-commerce because the traffic is a coordinated spike rather than organic load.

The three-layer approach:

Layer 1: Traffic shaping (thundering herd prevention)

When the sale starts, 50,000 users cannot all hit the inventory service simultaneously. A virtual queue accepts all requests and shows users a wait position. Only N requests per second are forwarded to inventory processing. This converts a spike into a stream.

Layer 2: Redis atomic DECR (inventory control)

Before the sale starts, pre-load inventory: SET flash_inventory:product_123 100

For each dequeued request:

  • count = DECR flash_inventory:product_123
  • If count < 0: restore with INCR, return "sold out"
  • If count >= 0: return "order placed — processing"

Redis is single-threaded. DECR is atomic. Exactly 100 requests succeed. The 101st sees -1 and is rejected. No distributed locks, no database contention.

Layer 3: Async Saga processing

The 100 winning requests get an immediate "order placed" response. Their orders are pushed to a Kafka queue. A pool of Order Workers runs the full Saga (payment, inventory confirmation, order creation) asynchronously. Users receive confirmation emails within seconds.

If a winner's payment fails:

The Saga compensation releases the reserved inventory back to Redis and the DB. The item becomes available again. The user is notified. This is the correct business behaviour — unpaid reservations should not be held forever.


What is the difference between optimistic and pessimistic locking?

Pessimistic locking assumes conflicts will happen and prevents them by holding a lock on the row for the duration of the transaction. Optimistic locking assumes conflicts are rare and detects them at write time using a version number.

Pessimistic LockingOptimistic Locking
Lock heldDuring the entire transactionNot held — conflict detected on write
ConcurrencyLow — other writers waitHigh — no blocking on reads
Best forLow concurrency, high conflict probabilityHigh concurrency, low conflict probability
Failure modeLock contention under high loadRetry storm under very high load
SQL patternSELECT ... FOR UPDATEUPDATE ... WHERE version = :v → check rows affected

In practice for e-commerce:

  1. Use pessimistic locking when the conflict rate is high and the transaction is short — e.g., updating a cart counter where two users are simultaneously editing
  2. Use optimistic locking for inventory updates under normal load — most products have low concurrent purchase rates
  3. Switch to Redis DECR for flash sales and high-demand items where optimistic lock retries would overwhelm the database

Why does each microservice own its own database?

Database-per-service is the foundational rule of microservices — it enforces the service boundary at the data layer, not just the API layer.

Why sharing a database breaks microservices:

  1. Schema coupling — if two services share a table, changing that table's schema requires coordinating deployments across both services
  2. Scaling bottleneck — you can't scale the inventory database independently from the user database if they're the same database
  3. Blast radius — a poorly written query in the Reviews service can lock a table that the Checkout service needs, causing a cascade failure

Why different services use different databases:

  1. Cart (Redis) — needs sub-millisecond writes; ephemeral data; no relational structure needed
  2. Product catalog (PostgreSQL + MongoDB) — structured fields in PostgreSQL; variable attributes in a document store
  3. Inventory (PostgreSQL with version column) — needs ACID transactions for stock consistency
  4. Search (Elasticsearch) — needs an inverted index for full-text search; PostgreSQL LIKE cannot scale to 500M products
  5. Orders (PostgreSQL) — relational, transactional, audit trail

Each choice is driven by the service's access pattern — not by convenience or convention.


Which companies ask the e-commerce system design question in interviews?

Amazon, Shopify, Flipkart, eBay, Instacart, Stripe, and Google ask variants of this question for senior software engineer and system design roles.

Why it is a consistently popular interview question:

  1. Different consistency requirements per domain — inventory needs strong consistency; search tolerates staleness; cart just needs speed. Getting this matrix right signals senior-level thinking
  2. Covers multiple hard sub-problems — overselling prevention, the Saga pattern, payment idempotency, and flash sale concurrency are each interview-worthy topics on their own
  3. Directly maps to real products — every company on the list runs commerce infrastructure or builds tools for it

What interviewers specifically listen for:

  1. Three inventory strategies named with trade-offs — pessimistic, optimistic, and Redis DECR — and knowing which to use when
  2. Saga pattern with compensation — not just naming Saga but explaining what Step B's compensating transaction is when Step C fails
  3. Payment idempotency key mechanics — the Redis "processing" / "captured" state machine, not just "use idempotency keys"
  4. Flash sale three-layer approach — traffic shaping + Redis DECR + async processing together, not just one of them
  5. Why database-per-service — with the specific reasons (schema coupling, blast radius), not just "because microservices"

The e-commerce question is popular precisely because it has so many depth directions — interviewers can go deep on the Saga pattern, the flash sale concurrency problem, or the search infrastructure depending on what they're probing for. Knowing the concepts is one thing; being able to navigate those pivots in real time is a different skill. Mockingly.ai has system design simulations where you practice exactly that — fielding follow-ups under pressure, for roles at Amazon, Shopify, Flipkart, and beyond.

Companies That Ask This

Ready to Practice?

You've read the guide — now put your knowledge to the test. Our AI interviewer will challenge you with follow-up questions and give you real-time feedback on your system design.

Free tier includes unlimited practice with AI feedback • No credit card required

Related System Design Guides