How to Design an Online Book Review System

"Design an online book review system" is one of those interview questions that looks gentle on the surface. Users write reviews, see ratings, browse books. Nothing screams distributed systems challenge.

Then the interviewer asks how you compute the average rating for a book with 2 million reviews without running a SELECT AVG() every time someone loads the page. Or how you build a "friends' recent reviews" feed without reading from ten separate users' review tables on every request. Or how you recommend books to a brand-new user who has never reviewed anything.

Those are the questions that test real system design depth. And when the interviewer is from Amazon — which owns Goodreads and Audible — they have a very specific interest in seeing you think through content discovery at scale.

This guide covers the full design, in the kind of back-and-forth you'd actually have in the interview room.

Step 1: Clarify the Scope

Interviewer: Design an online book review system.

Candidate: Before I start — a few questions. Are we designing something like Goodreads, where the social layer is core, or more like Amazon's product reviews, where it's primarily attached to a commerce experience? Do users have a social graph — can they follow other reviewers? Do we need a personalised "what to read next" recommendation feature? And is the book catalog user-contributed, or does it come from a canonical data source like a publisher feed?

Interviewer: Think Goodreads — social reading, user-written reviews and ratings, a social graph where users follow each other and see friends' activity. Recommendations are in scope. The book catalog is canonical — seeded from a publisher data feed, with users able to suggest additions. Assume Goodreads' approximate scale.

Candidate: Perfect. The social feed and recommendation engine are where the real complexity lives in this design, so I'll make sure we get into both. Let me start with requirements and numbers.

Requirements

Functional

Users can search for books by title, author, ISBN, or genre
Users can rate a book (1–5 stars) and write a text review
A book's page shows its average rating, number of ratings, and paginated reviews
Users can mark a book as "read", "currently reading", or "want to read" (reading shelf)
Users can follow other users and see a feed of their friends' recent activity (reviews, shelf updates)
Users can mark reviews as helpful or report them as spam
The system recommends books to users based on reading history and preferences
Authors/publishers can claim a book page and add canonical metadata

Non-Functional

Read-heavy — book pages and feeds are read far more than they are written
Low read latency — book pages must load in under 200ms
Accurate ratings — average ratings must reflect actual review data
Social feed freshness — friends' activity should appear within minutes
Search relevance — search must handle partial titles, author misspellings, and genre queries

Back-of-the-Envelope Estimates

Interviewer: Walk me through the scale.

Candidate: Goodreads has around 150 million registered users and over 3.5 billion books on members' shelves. Let me build from that.

plaintext

Registered users:          150 million
Daily Active Users (DAU):  10 million
Books in catalog:          50 million
Reviews per book (avg):    200 (popular books have hundreds of thousands)
 
New reviews per day:       ~500,000
New ratings per day:       ~2 million (many users rate without writing a review)
 
Write QPS (reviews/ratings): ~28 writes/sec (average), ~140 writes/sec (peak)
 
Read QPS:
  Book page reads:   10M DAU × 10 reads/day / 86,400s ≈ 1,160 reads/sec
  Feed reads:        10M DAU × 5 reads/day / 86,400s ≈ 580 reads/sec
  Search requests:   10M DAU × 3 searches/day / 86,400s ≈ 347 searches/sec
  Total peak reads:  ~10,000 reads/sec
 
Storage:
  Book record:         ~5 KB average
  50M books:           ~250 GB
  Review record:       ~2 KB average
  500K reviews/day × 365 × 5 years = ~1.8 TB
  Rating data (int):   trivially small — ~20 bytes per rating

Two things stand out. This is a read-heavy system — around 70:1 reads-to-writes. And the number of ratings per popular book (some have 5 million+) makes naive average computation expensive. Those two observations drive the most important architectural decisions.

Database Design

The data model is primarily relational — books have structured metadata, users have structured profiles, reviews have a clear relationship to both. PostgreSQL handles this well.

Books:

sql

CREATE TABLE books (
    book_id          UUID PRIMARY KEY,
    isbn_13          TEXT UNIQUE,
    title            TEXT NOT NULL,
    author_ids       UUID[],              -- array of author references
    genre_ids        INT[],               -- array of genre references
    description      TEXT,
    cover_image_url  TEXT,
    published_at     DATE,
    publisher_id     UUID,
    page_count       INT,
    language         TEXT DEFAULT 'en',
    -- Denormalised rating fields (updated on each new review)
    rating_count     INT NOT NULL DEFAULT 0,
    rating_sum       BIGINT NOT NULL DEFAULT 0,
    avg_rating       NUMERIC(3,2),        -- pre-computed, updated via trigger
    created_at       TIMESTAMPTZ DEFAULT NOW()
);

Users:

sql

CREATE TABLE users (
    user_id         UUID PRIMARY KEY,
    username        TEXT UNIQUE NOT NULL,
    email           TEXT UNIQUE NOT NULL,
    display_name    TEXT,
    bio             TEXT,
    avatar_url      TEXT,
    follower_count  INT DEFAULT 0,        -- denormalised
    following_count INT DEFAULT 0,        -- denormalised
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

Reviews:

sql

CREATE TABLE reviews (
    review_id       UUID PRIMARY KEY,
    book_id         UUID NOT NULL REFERENCES books(book_id),
    user_id         UUID NOT NULL REFERENCES users(user_id),
    rating          SMALLINT NOT NULL CHECK (rating BETWEEN 1 AND 5),
    review_text     TEXT,                 -- null if rating-only
    helpful_count   INT DEFAULT 0,        -- denormalised
    not_helpful_count INT DEFAULT 0,
    status          TEXT DEFAULT 'active', -- active, flagged, removed
    created_at      TIMESTAMPTZ DEFAULT NOW(),
    updated_at      TIMESTAMPTZ DEFAULT NOW(),
    UNIQUE (book_id, user_id)             -- one review per user per book
);

Social graph and reading shelf:

sql

-- Follow relationships
CREATE TABLE follows (
    follower_id   UUID REFERENCES users(user_id),
    followee_id   UUID REFERENCES users(user_id),
    created_at    TIMESTAMPTZ DEFAULT NOW(),
    PRIMARY KEY   (follower_id, followee_id)
);
 
-- Reading shelf
CREATE TABLE shelf_entries (
    user_id     UUID REFERENCES users(user_id),
    book_id     UUID REFERENCES books(book_id),
    status      TEXT NOT NULL CHECK (status IN ('read', 'reading', 'want_to_read')),
    updated_at  TIMESTAMPTZ DEFAULT NOW(),
    PRIMARY KEY (user_id, book_id)
);

Rating Aggregation: The Critical Design Decision

This is the first thing an experienced interviewer will probe, and it's where most candidates stumble.

Interviewer: A popular book like Harry Potter has 5 million ratings. When someone loads the book page, how do you show the average rating?

Candidate: You cannot run SELECT AVG(rating) FROM reviews WHERE book_id = X on every page load. That's a 5-million-row aggregation on the critical read path. At 1,160 book page reads per second, you'd be doing over a billion aggregation operations per day.

The solution is denormalised counters. The books table has rating_count and rating_sum columns. When a new review is submitted — or an existing review is updated or deleted — the counters are updated atomically:

sql

-- On new review submission
UPDATE books
SET rating_count = rating_count + 1,
    rating_sum   = rating_sum + :new_rating,
    avg_rating   = (rating_sum + :new_rating)::NUMERIC / (rating_count + 1)
WHERE book_id = :book_id;

The book page reads avg_rating directly — it's a pre-computed field, a single column read. No aggregation at read time.

Interviewer: What if the user edits their existing review and changes their rating from 3 to 5?

Candidate: Adjust by the delta rather than recomputing from scratch:

sql

UPDATE books
SET rating_sum   = rating_sum + (:new_rating - :old_rating),
    avg_rating   = (rating_sum + (:new_rating - :old_rating))::NUMERIC / rating_count
WHERE book_id = :book_id;

And if they delete their review:

sql

UPDATE books
SET rating_count = rating_count - 1,
    rating_sum   = rating_sum - :deleted_rating,
    avg_rating   = CASE
        WHEN rating_count - 1 = 0 THEN NULL
        ELSE (rating_sum - :deleted_rating)::NUMERIC / (rating_count - 1)
    END
WHERE book_id = :book_id;

These updates must happen in a transaction with the review insert/update/delete. If the review write succeeds but the counter update fails, the displayed rating drifts from reality.

The Bayesian Average Problem

Interviewer: A brand-new book has exactly one 5-star review. Should it rank above a book with 50,000 reviews and a 4.9 average?

Candidate: No — and this is the problem with a raw arithmetic average. One 5-star rating and 50,000 5-star ratings are not equivalent signals, but a simple average treats them identically.

The solution is a Bayesian average (also called a weighted rating). It pulls the raw average toward a global prior — the average rating across all books — proportionally to how few ratings a book has. The formula:

plaintext

Bayesian average = (C × m + R × v) / (C + v)
 
Where:
  R = the book's raw average rating
  v = the book's number of ratings
  m = the global mean rating across all books (e.g., 3.8)
  C = a confidence threshold (e.g., 100 — the minimum ratings before we "trust" the raw average)

A book with 1 review and a 5.0 average gets a Bayesian score of roughly 3.85 — barely above the prior. A book with 10,000 reviews and a 4.9 average gets a score very close to 4.9. This is exactly how IMDb computes its Top 250 rankings, and how Goodreads uses weighted averages to power their bestseller rankings.

The Bayesian average is computed as a batch job periodically — not on every review submission. It lives in a separate ranking_score column used specifically for sorting/discovery, while avg_rating remains the raw arithmetic mean displayed to users.

The Bayesian average is one of those answers that either makes an interviewer's eyes light up or reveals a gap in preparation — most candidates either know it cold or haven't heard of it. If that explanation doesn't feel natural yet, Mockingly.ai includes rating system design prompts where ranking and aggregation are standard follow-ups.

Book Search with Elasticsearch

Interviewer: How does book search work? Users search for "harry potter" and expect to find the series.

Candidate: Search is an Elasticsearch problem. The book catalog is stored canonically in PostgreSQL, but Elasticsearch is the search index — a read-optimised replica for queries.

The Elasticsearch document for a book:

json

{
  "book_id": "uuid",
  "title": "Harry Potter and the Sorcerer's Stone",
  "title_suggest": "Harry Potter and the Sorcerer's Stone",
  "authors": ["J.K. Rowling"],
  "genres": ["Fantasy", "Young Adult"],
  "description": "...",
  "isbn_13": "9780439708180",
  "avg_rating": 4.47,
  "rating_count": 8200000,
  "published_year": 1997,
  "language": "en"
}

When a user searches "harry potter", Elasticsearch runs a multi-field match query against title, authors, and description, returning results ranked by a combination of text relevance and rating_count. Popular books with high engagement naturally surface above obscure ones.

For autocomplete as the user types, we use ES's completion suggester on the title_suggest field — prefix-based in-memory lookups under 5ms.

Keeping ES in sync: when a book is created or updated in PostgreSQL, a Kafka event (book.upserted) triggers an ES indexer worker. The index is eventually consistent — typically within seconds. This is acceptable; a book metadata update doesn't need to reflect in search results instantaneously.

Interviewer: How do you handle misspelled queries — "harey poter"?

Candidate: Elasticsearch's fuzzy matching computes edit distance between the query and indexed terms. With fuzziness: "AUTO", ES allows 1 edit for words of length 5+ and 2 edits for longer words. "harey" matches "harry" with 1 edit; "poter" matches "potter" with 1 edit. The result still surfaces the correct book.

Activity Feed: Friends' Recent Reviews

The social feed — "your friend Alice just rated The Midnight Library 4 stars" — is a fan-out problem similar to a Twitter-style feed, but with a much lower write throughput.

Interviewer: How do you build the friends' activity feed?

Candidate: I'd use a hybrid fan-out model — fan-out-on-write for users with a normal following count, fan-out-on-read for users following many accounts (power users like book bloggers with thousands of followers).

When a user submits a review, the system:

plaintext

1. Persists the review to PostgreSQL
2. Publishes a review.created event to Kafka
3. A Fan-Out Worker consumes the event:
   - Fetches the reviewer's followers from PostgreSQL (follows table)
   - For each follower: prepend the review's activity_id to their feed list in Redis
     LPUSH feed:{follower_user_id} {activity_id}
     LTRIM feed:{follower_user_id} 0 999  -- keep latest 1000 entries
4. When a follower loads their feed:
   - LRANGE feed:{follower_user_id} 0 49  -- fetch first 50 activity IDs
   - Batch-fetch the activity details from an Activity Store
   - Return assembled feed items

The Activity Store holds lightweight activity records: {activity_id, actor_user_id, activity_type, book_id, review_id, created_at}. Loading the feed means fetching 50 activity IDs from Redis, then one batch read from the Activity Store — fast and bounded.

Interviewer: What if a popular book blogger has 500,000 followers and writes a review? That's 500,000 Redis writes.

Candidate: That's the celebrity fan-out problem. For users above a threshold — say, 10,000 followers — we skip the fan-out-on-write. Their activity is not pre-loaded into followers' Redis feeds. Instead, when a follower loads their feed, the Feed Service detects which of their followees are "high-follow" accounts, fetches those accounts' recent activity directly, and merges it with the pre-computed feed from Redis.

The merge adds a bounded number of reads — one per high-follow account the user follows — and those reads are cached separately. Most users don't follow many high-follow accounts, so the overhead is small.

This is one of those concepts that's easy to understand on paper but harder to explain cleanly under interview pressure. If that's a gap you want to close before a real interview, Mockingly.ai has feed design problems specifically in its simulation library, which makes it a useful place to practice.

Recommendation Engine

Interviewer: How do you recommend books to users?

Candidate: There are three progressively sophisticated approaches, and a mature system uses all three.

Tier 1: Editorial / Popularity-Based

For users with no history, recommend bestsellers in their declared genre preferences. Popularity lists are pre-computed batch jobs: top-rated books by genre, trending this week, staff picks. These are static lists cached in Redis, refreshed every few hours.

This solves the cold start problem for new users — they see meaningful content immediately, not blank recommendations, without any personalisation data.

Tier 2: User-Based Collaborative Filtering

Once a user has rated several books, we can find users with similar taste and recommend what those users loved. The collaborative filtering approach:

plaintext

1. Build a user-book rating matrix (sparse — most users haven't rated most books)
2. Compute similarity between users using cosine similarity or Pearson correlation
3. For user A, find the K most similar users (K-nearest neighbours)
4. Recommend books those users rated highly that user A hasn't read yet

This runs as a batch job (nightly or every few hours) in a data warehouse (Spark on BigQuery or EMR). The output — top 20 recommendations per user — is written to a recommendations table in Redis: SETEX recs:{user_id} 86400 [book_id_1, book_id_2, ...].

The recommendation pipeline reads from Kafka events (review.created, shelf.updated) to keep the user-book interaction matrix fresh.

Tier 3: Item-Based Collaborative Filtering (More Scalable)

User-based CF has a scaling problem: with 150 million users, computing pairwise similarity between all users is expensive. Item-based CF inverts the problem — compute similarity between books instead. "Users who read book A also read book B" relationships are more stable over time and computed offline.

Each book gets a precomputed list of similar books. When a user loads a book page, the "readers also enjoyed" section is a simple Redis lookup — no online computation needed.

Interviewer: What's the cold start problem and how do you handle it for new users?

Candidate: Cold start is the problem of recommending to users (or new books) with no interaction history. Without ratings, collaborative filtering has no signal.

For new users, the solution is progressive profiling. During onboarding, ask the user to pick 3–5 genres they enjoy and rate a handful of well-known books from each genre. Even 5 explicit ratings give collaborative filtering enough signal to find similar users. Until those ratings exist, fall back to popularity-based recommendations.

For new books (items cold start), collaborative filtering can't use user interactions because there are none yet. Content-based filtering fills the gap here: use the book's metadata — genre, author, description keywords — to find structurally similar books. "This new fantasy novel by Brandon Sanderson is similar to his other work and to other epic fantasy" is a reasonable recommendation even before any user has reviewed it.

The three-tier recommendation architecture — popularity, user-based CF, item-based CF — plus the cold start handling for both new users and new books is a complete answer that covers every scenario the interviewer might probe. Getting through all of it clearly, in sequence, under time pressure takes practice. Mockingly.ai has recommendation system follow-ups built into its book review simulations.

Caching Strategy

With 70:1 reads-to-writes, cache performance is what makes this system economically viable at scale.

Interviewer: How do you handle the read load at 10,000 peak reads per second?

Candidate: Three layers, each serving a different access pattern.

CDN for book cover images and static assets. Book covers are immutable once uploaded. Cache-Control: public, max-age=86400. CDN absorbs all image traffic — the origin never sees it after initial caching.

Redis for book metadata and ratings. When a user loads a book page, the primary data is the book's title, author, cover, description, and rating. This is a single Redis key per book:

plaintext

book:{book_id} → { title, author_names, avg_rating, rating_count, cover_url, ... }
TTL: 5 minutes

A cache hit returns the book metadata in under 1ms. A cache miss fetches from PostgreSQL and repopulates. When a review is submitted and the rating_count/avg_rating counters are updated, the Redis key is deleted (cache-aside invalidation). The next read rebuilds from the database.

Redis for the review list. A book's most recent and most-helpful reviews are cached as a sorted set — scored by helpful_count for the "top reviews" tab, and by created_at for the "newest" tab. New reviews are appended to the sorted set on write. The TTL is shorter here (1 minute) because review lists change more frequently.

Review Helpfulness and Moderation

A review system without quality signals becomes a wasteland of one-sentence spam. Two mechanisms keep it useful.

Helpful/Not Helpful votes. Users can vote on whether a review is helpful. The helpful_count and not_helpful_count are denormalised counters on the reviews table — same pattern as the book rating counters. Reviews with high helpful_count float to the top of the default sort order.

The ranking formula balances helpfulness votes with review age and recency, similar to Reddit's Wilson score confidence interval — a review with 100 helpful votes out of 102 total votes ranks higher than one with 50 out of 60, despite the lower raw count.

Content moderation. Users can flag reviews as spam or inappropriate. A review that exceeds a flag threshold is automatically hidden and queued for human review. For a production system at Goodreads scale, automated content filtering (ML model trained on previously moderated reviews) pre-screens new reviews before they go live on high-visibility books.

API Design

Interviewer: What does the API look like?

Candidate: REST-based, resource-oriented.

plaintext

GET    /books/{book_id}
  Response: book metadata, avg_rating, rating_count, cover_url
 
GET    /books/{book_id}/reviews?sort=helpful|newest&page_token=...
  Response: paginated list of reviews with author info
 
POST   /books/{book_id}/reviews
  Body: { rating, review_text }
  Auth required
  Response: created review with review_id
 
PATCH  /books/{book_id}/reviews/{review_id}
  Body: { rating, review_text }
  Auth required (owner only)
  Response: updated review
 
POST   /books/{book_id}/reviews/{review_id}/votes
  Body: { vote: "helpful" | "not_helpful" }
  Auth required
 
GET    /search/books?q=harry+potter&genre=fantasy&sort=relevance|rating
  Response: paginated search results from Elasticsearch
 
GET    /users/{user_id}/feed?page_token=...
  Auth required
  Response: paginated activity feed of followees' recent activity
 
GET    /users/{user_id}/recommendations
  Auth required
  Response: personalised book recommendations from recommendation engine
 
PUT    /users/{user_id}/shelf/{book_id}
  Body: { status: "read" | "reading" | "want_to_read" }
  Auth required

Review pagination uses cursor-based tokens (encoding the last-seen review_id and sort column value) to remain stable as new reviews arrive. Offset pagination breaks when new reviews are added between page fetches.

Scaling Discussion

Interviewer: How does this scale to a sudden spike — a book goes viral on social media overnight?

Candidate: A viral book creates two spikes simultaneously: a read spike on the book's page and a write spike as thousands of people rate and review it simultaneously.

Read spike: the CDN absorbs image traffic. The Redis book metadata cache absorbs the majority of page reads. The PostgreSQL primary only sees traffic on cache misses — with a 5-minute TTL and heavy read traffic, the cache hit rate approaches 100% within seconds of warming. The book page becomes effectively static for the duration of the spike.

Write spike on rating counters: with thousands of simultaneous UPDATE books SET rating_count = rating_count + 1 statements, row-level locking on the books table becomes the bottleneck. The solution is to decouple counter updates from review writes.

Instead of updating counters synchronously, write reviews to PostgreSQL immediately and publish a review.created event to Kafka. A dedicated Counter Update Service consumes Kafka events and updates the rating_count / rating_sum in batches — say, every 5 seconds. Under a viral spike, thousands of reviews batch into a handful of database updates. The displayed count lags by a few seconds — entirely acceptable for a counter that says "8,200,005" vs "8,203,471".

Common Interview Follow-ups

"How would you implement the 'reading challenge' feature — a user sets a goal of 50 books in a year?"

Reading challenge progress is computed from the shelf_entries table. When a user marks a book as "read" with a date in the current year, a counter increments. Store the current year's count as a denormalised field on the user's reading challenge record. The progress bar reads this single field — no aggregation query needed. At year-end, archive the result.

"What if a popular author has thousands of books and a user wants to see all of them sorted by rating?"

This is an Elasticsearch query with a filter on author_id and a sort by avg_rating desc. It's the same query infrastructure as general search. For the author's profile page, the top 10 books by rating are pre-cached in Redis at author_id level with a short TTL. The full list is served from Elasticsearch on demand.

"How do you handle book editions — the same book has a UK edition, a US edition, a hardback, a paperback, and an audiobook?"

Group editions under a canonical "work" concept. A works table represents the abstract book; multiple editions belong to one work. Reviews and ratings are aggregated at the work level by default (users are rating the story, not the paper stock). Individual editions have their own pages for edition-specific metadata (page count, cover, narrator for audiobooks), but the review scores roll up to the work. This is exactly how Goodreads and LibraryThing model it.

"How do you prevent review bombing — coordinated fake negative reviews targeting a book?"

Multiple signals. Rate limit reviews per IP per book per day. Require a minimum account age before a review is published on high-visibility books. Flag anomalous review velocity — if 500 1-star reviews arrive in 10 minutes for a book that had 3 reviews yesterday, that's automatically escalated. Weigh votes from accounts with review history more heavily than brand-new accounts. These are defence-in-depth heuristics, not perfect solutions — but they raise the cost of coordinated abuse significantly.

Quick Interview Checklist

✅ Clarified scope — social features, recommendations, catalog source, Goodreads-like
✅ Back-of-the-envelope — 70:1 read-to-write, popular books have millions of ratings
✅ PostgreSQL schema — books, users, reviews, shelf_entries, follows with correct constraints
✅ Denormalised rating_count and rating_sum — no aggregation on read path
✅ Atomic counter updates in transaction with review writes
✅ Delta update on review edit: rating_sum += (new_rating - old_rating)
✅ Bayesian average for ranking — distinguishes 1 five-star from 50,000 five-star
✅ Elasticsearch for book search — multi-field match, fuzzy for misspellings, completion suggester for autocomplete
✅ Kafka event to keep ES in sync with PostgreSQL writes
✅ Hybrid fan-out for activity feed — pre-built Redis lists for normal users, merge-at-read for high-follow accounts
✅ Feed as Redis LPUSH/LRANGE over activity IDs, not full activity objects
✅ Three-tier recommendations: popularity → collaborative filtering → item similarity
✅ Cold start handled by onboarding genre selection and content-based filtering for new books
✅ CDN for book covers, Redis for metadata (5-min TTL) and review lists
✅ Cache-aside invalidation on review submit (delete key, not update)
✅ Kafka-batched counter updates under viral traffic spikes
✅ Cursor-based pagination for review lists
✅ Wilson score for review helpfulness ranking

Conclusion

A book review system looks like a CRUD app until you start asking the hard questions. How do you compute a 5-million-row average on every page load? (You don't — you maintain running counters.) How do you rank a brand-new 5-star book against a proven classic? (Bayesian average.) How do you build a social feed without re-querying every friend's review history on every load? (Fan-out to Redis, merge at read-time for celebrity accounts.)

These are the questions that turn a medium-difficulty interview problem into a genuinely interesting distributed systems conversation — and the ones Amazon, Google, and Goodreads interviewers care about.

The design pillars:

Denormalised rating counters — rating_count and rating_sum updated atomically on write; average is a pre-computed column, never a runtime aggregation
Bayesian average for ranking — a book with 1 rating and a 5.0 should not outrank a book with 50,000 ratings and a 4.9
Elasticsearch for search — multi-field relevance with fuzzy matching; eventually consistent via Kafka sync
Hybrid fan-out for the social feed — Redis LPUSH for normal users; merge-at-read for high-follow accounts
Three-tier recommendations — popularity for cold start, collaborative filtering once history exists, item-based for scale
Redis as the read layer — book metadata, top reviews, and recommendation lists all served from Redis, not PostgreSQL, on the hot path
Kafka-batched counter updates under spikes — decouple the review write from the counter update to survive viral traffic

Frequently Asked Questions

How do you compute average ratings at scale without running SELECT AVG() on every page load?

Denormalised counters — two extra columns on the books table: rating_count (integer) and rating_sum (bigint). The average is pre-computed and stored directly; it is never aggregated at read time.

Why SELECT AVG() fails at scale:

A popular book has 5 million ratings — AVG() scans 5 million rows per request
At 1,160 book page reads per second, that's 5.8 billion row scans per day
PostgreSQL cannot sustain this; query latency spikes and the database saturates

How denormalised counters work:

On new review: rating_count + 1, rating_sum + new_rating, avg_rating = rating_sum / rating_count
On rating edit: rating_sum += (new_rating - old_rating) — no change to rating_count
On review delete: rating_count - 1, rating_sum - deleted_rating
All three operations run in a transaction with the review write — counter drift on failure is prevented

The book page reads avg_rating as a single column — one indexed key lookup, sub-millisecond, regardless of how many ratings exist.

What is a Bayesian average and why is it better for book rankings?

A Bayesian average pulls a book's raw rating toward the global mean proportionally to how few ratings the book has — preventing a single 5-star review from outranking a book with 50,000 reviews.

The problem with raw arithmetic average:

Book A: 1 review, 5.0 stars → raw average = 5.0
Book B: 50,000 reviews, 4.9 stars → raw average = 4.9
Book A ranks higher — but it has almost no signal. This is wrong.

The Bayesian average formula:

plaintext

Bayesian score = (C × m + R × v) / (C + v)
 
R = book's raw average rating
v = book's number of ratings
m = global mean rating across all books (e.g. 3.8)
C = confidence threshold (e.g. 100 — minimum ratings to "trust" the raw average)

Applied to the example:

Book A (1 review, 5.0): (100 × 3.8 + 5.0 × 1) / (100 + 1) ≈ 3.85 — barely above prior
Book B (50,000 reviews, 4.9): (100 × 3.8 + 4.9 × 50,000) / (100 + 50,000) ≈ 4.90 — near raw average

IMDb uses this formula for its Top 250 rankings. Goodreads uses weighted averages for its bestseller lists. The Bayesian score lives in a ranking_score column — used for sorting and discovery — while avg_rating remains the raw arithmetic mean shown to users.

How does Elasticsearch power book search with fuzzy matching and autocomplete?

Elasticsearch handles three distinct search capabilities that PostgreSQL's LIKE queries cannot: full-text relevance ranking, fuzzy matching for misspellings, and prefix-based autocomplete.

How the three capabilities work:

1. Full-text relevance search:

json

{
  "multi_match": {
    "query": "harry potter",
    "fields": ["title^3", "authors^2", "description"],
    "type": "best_fields"
  }
}

Results are ranked by a combination of TF-IDF text relevance and rating_count — popular books with many reviews naturally surface above obscure ones.

2. Fuzzy matching for misspellings:

With "fuzziness": "AUTO", Elasticsearch computes edit distance between query terms and indexed terms. "harey poter" matches "harry potter" — "harey" → "harry" is 1 edit, "poter" → "potter" is 1 edit.

3. Autocomplete as the user types:

The title_suggest field uses ES's completion suggester — a prefix-based in-memory structure. Typing "har" returns "Harry Potter..." suggestions in under 5ms.

Keeping ES in sync with PostgreSQL: when a book is created or updated, a Kafka event (book.upserted) triggers an indexer worker that updates the ES document. The index is eventually consistent — typically seconds behind PostgreSQL. This is acceptable; a book metadata update does not need to be instantly searchable.

How does the friends' activity feed work — fan-out explained?

The activity feed uses a hybrid fan-out model: pre-push activity references into followers' Redis lists for normal users, merge celebrity activity at read time for high-follower accounts.

Fan-out on write (normal users):

User submits a review → persisted to PostgreSQL
review.created event published to Kafka
Fan-Out Worker fetches the reviewer's follower list
For each follower: LPUSH feed:{follower_id} {activity_id} + LTRIM feed:{follower_id} 0 999
When follower loads their feed: LRANGE feed:{follower_id} 0 49 → 50 activity IDs → batch-fetch details

Fan-out on read (celebrity users with 10,000+ followers):

Skip the Redis fan-out on write — 500,000 writes per celebrity post is too expensive
When a follower loads their feed: detect which followees are high-follow accounts
Fetch those accounts' recent activity directly (cached separately)
Merge with the pre-computed Redis feed and return the combined result

Why store activity IDs in Redis, not full activity objects:

If the activity details change (review edited, book metadata updated), the Redis list would contain stale data. Storing only IDs means the Activity Store is always the authoritative source — the feed assembly fetches fresh details on each load.

How do book recommendations work and what is the cold start problem?

Book recommendations use three progressively sophisticated approaches. A mature system like Goodreads uses all three simultaneously, routing users to the appropriate tier based on how much interaction history they have.

Tier 1: Popularity-based (new users)

Pre-computed bestseller lists by genre, refreshed every few hours, stored in Redis. New users with no history see the most popular books in their declared genre preferences — meaningful content immediately, no personalisation data required.

Tier 2: User-based collaborative filtering (users with history)

Build a sparse user-book rating matrix
Compute cosine similarity between users
For each user, find the K most similar users (K-nearest neighbours)
Recommend books those users rated highly that the target user hasn't read

Runs as a batch job (nightly) in a data warehouse. Output stored in Redis: SETEX recs:{user_id} 86400 [book_ids].

Tier 3: Item-based collaborative filtering (scale)

User-based CF at 150M users requires computing pairwise similarity between all users — expensive. Item-based CF instead computes "users who read book A also read book B" relationships, which are more stable over time. Each book gets a precomputed similar-books list. The "readers also enjoyed" section on any book page is a simple Redis lookup.

The cold start problem — two cases:

New user: no interaction history → fall back to popularity. During onboarding, ask users to pick 3–5 genres and rate a handful of familiar books. Even 5 explicit ratings give collaborative filtering enough signal
New book: no user interactions yet → use content-based filtering. Match on genre, author, description keywords to find structurally similar books

What is the Wilson score and how does it rank helpful reviews?

Wilson score is a statistical lower bound on the "true" helpfulness rate, which prevents reviews with few votes but all positive from outranking reviews with many votes and slightly fewer positive proportionally.

The problem with raw helpful counts:

Review A: 100 helpful votes out of 102 total (98% positive)
Review B: 50 helpful votes out of 51 total (98% positive)
Review A should rank higher — it has the same positive rate with far more signal — but a naive sort by helpful count puts them equal

Wilson score solves this by computing a confidence interval around the observed helpfulness rate:

plaintext

Wilson score = (p + z²/2n - z × √(p(1-p)/n + z²/4n²)) / (1 + z²/n)
 
p = proportion of positive votes (helpful / total)
n = total number of votes
z = 1.96 for 95% confidence interval

With few votes, the score is pulled down toward 0 (uncertainty). With many votes, the score converges to the true positive rate. This is the same formula Reddit used for comment ranking and that many review systems use for surfacing quality content.

The Wilson score is pre-computed as a column on the reviews table and updated when helpful/not-helpful votes change.

How do you prevent review bombing on a book review system?

Review bombing is coordinated submission of fake negative (or positive) reviews to manipulate a book's rating. No single mechanism stops it — defence in depth raises the cost of abuse high enough to deter most attempts.

The five-layer defence:

Rate limiting — maximum N reviews per IP per book per 24 hours. New accounts are rate-limited more aggressively than established ones
Account age requirements — reviews on high-visibility books require a minimum account age (e.g., 7 days) before they are published. Brand-new accounts cannot immediately manipulate ratings
Velocity detection — if a book receives more than X one-star reviews in Y minutes (when its baseline is negligible), the new reviews are held for moderation rather than published immediately
Vote weighting — helpful/not-helpful votes from accounts with established review history carry more weight than votes from new accounts. This reduces the impact of sockpuppet accounts
ML content screening — a classifier trained on previously moderated reviews flags suspicious content before it goes live on high-visibility books

Review bombing is an ongoing cat-and-mouse problem rather than a solved one. These layers together raise the cost of a successful attack significantly without blocking legitimate new users.

Which companies ask the online book review system design question?

Amazon, Google, Goodreads, Audible, Scribd, and Dropbox ask variants of this question for senior software engineer roles.

Why it is a popular interview question despite looking simple:

Multiple non-obvious depth traps — rating aggregation at scale, Bayesian ranking, fan-out for the social feed, and the cold start problem all appear in a single question
Directly relevant to Amazon — Amazon owns Goodreads and Audible. Interviewers from those teams have a specific interest in whether candidates understand content discovery at scale
Medium difficulty with real depth — accessible enough to start quickly without excessive scope-setting, hard enough to distinguish senior candidates who know the right abstractions

What interviewers specifically listen for:

Denormalised counters — not SELECT AVG(), and the delta update pattern for edits
Bayesian average — named and explained, not just "use a weighted average"
Elasticsearch, not PostgreSQL LIKE — with fuzzy matching and the Kafka sync mechanism
Hybrid fan-out — not just "use a fan-out queue" but the celebrity threshold and merge-at-read approach
Cold start handled — both new users (onboarding ratings) and new books (content-based filtering)

If any of those five feel uncertain when explaining them live, Mockingly.ai runs book review and social platform system design simulations — with follow-ups on rating aggregation, recommendation architecture, and fan-out design — built for engineers preparing for roles at Amazon, Google, and Audible.

What trips people up in this interview isn't the happy path — it's the follow-ups: how do you rank a book with one 5-star rating, how do you handle review bombing, how do you avoid a fanout storm when a famous author posts a review. The best preparation isn't reading more articles — it's rehearsing your answers out loud under time pressure. Mockingly.ai runs realistic system design mock interviews for exactly this kind of question, built for engineers preparing for senior roles at Amazon, Google, Audible, and Scribd.

Blog

How to Design an Online Book Review System - Complete System Design Guide

How to Design an Online Book Review System

Step 1: Clarify the Scope

Requirements

Functional

Non-Functional

Back-of-the-Envelope Estimates

Database Design

Rating Aggregation: The Critical Design Decision

The Bayesian Average Problem

Book Search with Elasticsearch

Activity Feed: Friends' Recent Reviews

Recommendation Engine

Caching Strategy

Review Helpfulness and Moderation

API Design

Scaling Discussion

Common Interview Follow-ups

Quick Interview Checklist

Conclusion

Frequently Asked Questions

How do you compute average ratings at scale without running SELECT AVG() on every page load?

What is a Bayesian average and why is it better for book rankings?

How does Elasticsearch power book search with fuzzy matching and autocomplete?

How does the friends' activity feed work — fan-out explained?

How do book recommendations work and what is the cold start problem?

What is the Wilson score and how does it rank helpful reviews?

How do you prevent review bombing on a book review system?

Which companies ask the online book review system design question?

Prepare for these companies

Interview Guide for This Topic

Practice "Design an Online Book Review System" with AI

Related Posts

Turn This Knowledge Into Interview Confidence