Revenue Analytics Data Models
Build a three-layer SQL analytics model for B2B fintech revenue reporting.
The Scenario
You are a junior analytics engineer at Paylo, a B2B SaaS fintech that processes payments on behalf of online businesses. Paylo charges a per-transaction fee and a monthly platform fee. The business team currently runs all revenue reporting off raw transactional tables — the queries are slow, inconsistent across analysts, and the CFO keeps getting different numbers from different people. Your analytics lead, Jerome, has asked you to fix this: "The raw tables are a mess for reporting. I need you to build a dimensional model on top of them — a clean facts and dimensions layer — and then write the key revenue metrics queries. Every analyst and the CFO should be able to run the same query and get the same number. Model first, metrics second."
The Task
Build a three-layer SQL analytics model for Paylo — staging, dimensional, and metrics — that transforms raw transactional data into reliable, consistent revenue reporting.
Requirements
Layer 1 — Staging Models
- stg_transactions: Clean raw.transactions — standardise currency, cast txn_date to DATE, convert paise to rupees, deduplicate on txn_id (keep latest), filter status = success.
- stg_merchants: Clean raw.merchants — standardise text casing, cast signup_date to DATE, remove null merchant_id.
Layer 2 — Dimensional Models
- dim_merchant: One row per merchant; include is_enterprise (boolean). Surrogate key required.
- dim_date: Date dimension 2022–2025. Generate via recursive CTE or generate_series — not manual lookup.
- fact_transactions: One row per successful transaction; FKs to dim_merchant and dim_date; include net_amount_inr.
Layer 3 — Five Metric Queries
- Monthly Recurring Revenue (MRR) — fee_inr per month, last 12 months.
- Top 10 Merchants by Revenue — current year, payments only.
- Refund Rate by Month — refund amount / payment amount as percentage.
- Revenue by Merchant Plan — fee_inr by plan for last quarter.
- Month-over-Month Revenue Growth — MRR with LAG() and % change.
Constraints
- Pure SQL — PostgreSQL dialect. No Python, no dbt required. CTEs for multi-step logic.
- Organise into /staging, /dimensional, /metrics folders.
- dim_date must be generated via SQL — not manually inserted.
- Do not modify raw source tables.
Mock Data Schema
raw.transactions: txn_id, merchant_id, txn_date, amount, currency, txn_type, status, fee_charged
raw.merchants: merchant_id, business_name, category, signup_date, plan, country
Minimum: 50 merchants, 2,000 transactions. Include dirty data: 20 duplicate txn_ids, mixed-case currency, string dates.
What to Submit
- SQL model files in /staging, /dimensional, /metrics folders
- setup.sql — creates raw tables, inserts sample data
- sample_outputs/ — 5 CSV files, one per metric query
- README: run order, database version, schema diagram
- DECISIONS.md (200–400 words): fact grain, deduplication strategy, window function choice
Hints
Two non-prescriptive hints. Read only if you're stuck — don't skip the thinking.
Hint 1
Deduplicate with ROW_NUMBER() OVER (PARTITION BY txn_id ORDER BY txn_date DESC), then WHERE row_num = 1.
Hint 2
generate_series('2022-01-01'::date, '2025-12-31'::date, '1 day'::interval) for dim_date.