WP-DE-002 Data Engineering Beginner

Revenue Analytics Data Models

Build a three-layer SQL analytics model for B2B fintech revenue reporting.

1

The Scenario

You are a junior analytics engineer at Paylo, a B2B SaaS fintech that processes payments on behalf of online businesses. Paylo charges a per-transaction fee and a monthly platform fee. The business team currently runs all revenue reporting off raw transactional tables — the queries are slow, inconsistent across analysts, and the CFO keeps getting different numbers from different people. Your analytics lead, Jerome, has asked you to fix this: "The raw tables are a mess for reporting. I need you to build a dimensional model on top of them — a clean facts and dimensions layer — and then write the key revenue metrics queries. Every analyst and the CFO should be able to run the same query and get the same number. Model first, metrics second."

2

The Task

Build a three-layer SQL analytics model for Paylo — staging, dimensional, and metrics — that transforms raw transactional data into reliable, consistent revenue reporting.

3

Requirements

3A

Layer 1 — Staging Models

  • stg_transactions: Clean raw.transactions — standardise currency, cast txn_date to DATE, convert paise to rupees, deduplicate on txn_id (keep latest), filter status = success.
  • stg_merchants: Clean raw.merchants — standardise text casing, cast signup_date to DATE, remove null merchant_id.
3B

Layer 2 — Dimensional Models

  • dim_merchant: One row per merchant; include is_enterprise (boolean). Surrogate key required.
  • dim_date: Date dimension 2022–2025. Generate via recursive CTE or generate_series — not manual lookup.
  • fact_transactions: One row per successful transaction; FKs to dim_merchant and dim_date; include net_amount_inr.
3C

Layer 3 — Five Metric Queries

  • Monthly Recurring Revenue (MRR) — fee_inr per month, last 12 months.
  • Top 10 Merchants by Revenue — current year, payments only.
  • Refund Rate by Month — refund amount / payment amount as percentage.
  • Revenue by Merchant Plan — fee_inr by plan for last quarter.
  • Month-over-Month Revenue Growth — MRR with LAG() and % change.
4

Constraints

  • Pure SQL — PostgreSQL dialect. No Python, no dbt required. CTEs for multi-step logic.
  • Organise into /staging, /dimensional, /metrics folders.
  • dim_date must be generated via SQL — not manually inserted.
  • Do not modify raw source tables.
5

Mock Data Schema

raw.transactions: txn_id, merchant_id, txn_date, amount, currency, txn_type, status, fee_charged
raw.merchants: merchant_id, business_name, category, signup_date, plan, country
Minimum: 50 merchants, 2,000 transactions. Include dirty data: 20 duplicate txn_ids, mixed-case currency, string dates.

6

What to Submit

  • SQL model files in /staging, /dimensional, /metrics folders
  • setup.sql — creates raw tables, inserts sample data
  • sample_outputs/ — 5 CSV files, one per metric query
  • README: run order, database version, schema diagram
  • DECISIONS.md (200–400 words): fact grain, deduplication strategy, window function choice
7

Hints

Two non-prescriptive hints. Read only if you're stuck — don't skip the thinking.

Hint 1

Deduplicate with ROW_NUMBER() OVER (PARTITION BY txn_id ORDER BY txn_date DESC), then WHERE row_num = 1.

Hint 2

generate_series('2022-01-01'::date, '2025-12-31'::date, '1 day'::interval) for dim_date.

Wooble  ·  Project #WP-DE-002  ·  Revenue Analytics Data Models  ·  Data Engineering Track  ·  v1.0