Microsimulation
Microsimulation flips the modelling paradigm from aggregate to individual. Instead of working with zone-level averages, you synthesize a virtual population where each person has demographics, income, preferences, and location. Then you simulate their behaviour.
Why Microsimulation?
Section titled “Why Microsimulation?”The aggregate models (Gravity, Huff, Regression) work with averages. But averages hide crucial variation:
- A zone with median income contains both very low and very high earners
- Age distribution matters: students, young professionals, and retirees eat differently
- Household composition affects spending: families vs. singles vs. couples
Microsimulation models individual heterogeneity — the fact that people in the same zone behave very differently.
“Microsimulation is a technique that focuses on the characteristics and behaviour of individuals, rather than the groups that are used by conventional spatial interaction models.” — Birkin & Clarke, Ch. 10
Data Requirements
Section titled “Data Requirements”| Dataset | What We Use | Link |
|---|---|---|
| Census Population | Synthetic population base | View → |
| Household Income | Agent income distribution | View → |
| Household Size | Household composition for agent generation | View → |
| Age Distribution | Age-stratified agent attributes | View → |
| Employment by District | Agent employment status and commute patterns | View → |
| Consumer Price Index | Agent spending budgets calibrated to real prices | View → |
| Restaurant Receipts | Validate simulated spending against real receipts | View → |
The Pipeline
Section titled “The Pipeline”- Build a synthetic population — merge census demographics with market research data to create individual-level records
- Assign product ownership / preferences — what does each synthetic person consume? Based on their demographics + location
- Generate behaviour — where do they go? How often? How much do they spend?
- Simulate — run the population through spatial interaction, accounting for accessibility, competition, and individual preferences
Technical Structure: EC-Sim Example
Section titled “Technical Structure: EC-Sim Example”Birkin & Clarke describe a 4-step microsimulation for financial services (EC-Sim), which translates directly to retail:
Step 1: Build micro-population sharing census demographics → Census data (age, income, household type) × geographic zones
Step 2: Add consumption patterns → Merge with market research / survey data
Step 3: Generate behaviour preferences → Not just demographics — also accessibility to services, which varies by location
Step 4: Simulate channel usage → Include physical provision (store locations, opening hours), brand, demographics
Simulation Logic
Section titled “Simulation Logic”For each synthetic individual i in the target area:
P(visit_restaurant_j) = f( distance(i, j), // walking time from home/office cuisine_match(i, j), // does j serve what i likes? price_match(i, j), // is j in i's budget? attractiveness(j), // size, reviews, brand time_of_day, // lunch crowd vs. dinner competition_nearby(j) // alternatives within 200m)Running this for all synthetic individuals across all restaurants in the district produces a predicted visit count and revenue for any restaurant at a given address.
Data Requirements for HK
Section titled “Data Requirements for HK”| Data | Availability | Quality |
|---|---|---|
| Census demographics | ✅ Census 2021, TPU level | Excellent |
| Household income distribution | ✅ Census 2021 | Good (banded) |
| Restaurant locations | ✅ FEHD licenses | Complete |
| Consumer preferences | ⚠️ No public data | Need survey or proxy |
| Actual spending patterns | ⚠️ No public data | Need Octopus/credit card data |
Microsimulation vs. Agent-Based Modelling
Section titled “Microsimulation vs. Agent-Based Modelling”| Microsimulation | ABM | |
|---|---|---|
| Unit | Synthetic individual | Autonomous agent |
| Behaviour | Rule-based from data | Emergent from interactions |
| Interactions | Individual → environment | Agent ↔ agent ↔ environment |
| Dynamics | Static snapshot or step-wise | Continuous time evolution |
| Data needs | Heavy (census + surveys) | Lighter (rules + parameters) |
| Best for | Demand estimation | Scenario testing |
The LLM Agent Simulation uses LLM-powered agents (Claude Opus) that combine microsimulation’s individual-level detail with ABM’s emergent behaviour — each agent has a synthetic persona AND can reason about complex tradeoffs.
Computational Reality
Section titled “Computational Reality”“In the late 1980s, two of the present authors developed a microsimulation approach using a synthetic sample of 50,000 households… programs were run overnight in batch mode on a mainframe computer costing about £1.5 million. In Chapter 10, we reported on an application using a sample of one million households, and can be run in a few seconds real time on a personal computer costing around £1,000.” — Birkin & Clarke, Ch. 12
In 2026, we can run microsimulation for all of Hong Kong (~2.7M households) on a laptop in minutes. The bottleneck is data, not compute.
Implementation Notes
Section titled “Implementation Notes”Current Implementation (2026-03-25)
Section titled “Current Implementation (2026-03-25)”Synthetic population: totalAgents = round(pop / 100), capped at min 500 and max 5000. Scales with actual catchment population rather than using a fixed agent pool.
Weekly meal rates (base, before modifiers):
- Breakfast: 8% of agents dine out per day (×7 days)
- Lunch: 25% per day × 5 workdays only
- Dinner: 18% per day (×7 days)
- Late night: 5% per day × 4 nights
Rates are further adjusted by price band, age18to64 fraction, and family household rate. targetMatch multiplier of 1.1 applies if targetCustomers is non-empty.
Capture rate (logarithmic, not density-tiered):
logShare = competitors ≤ 1 ? 0.15 : 0.15 / (1 + 0.3 × ln(competitors))densityBoost = density > 15000 ? 0.9 : density > 5000 ? 1.0 : 1.2captureRate = min(0.15, logShare × densityBoost)Examples: 1 competitor → 15%, 10 competitors → ~8%, 30 competitors → ~6%, 80 competitors → ~5%, 200 competitors → ~4%.
Physical capacity cap:
- Dine-in seats:
floorArea / sqftPerSeat(12–25 sqft/seat depending on price band) - Delivery:
floorArea × 0.3orders/day (kitchen throughput) - Takeaway:
seats × turns × 2 - Turns: High-end 1.5, Premium 2.0, Mid 2.5, Budget 3.0
Revenue distribution (monthly = weeklyBase × 4.3 × captureRate, capped by capacity):
- P10 = median × 0.55
- P25 = median × 0.75
- P50 = median (base estimate)
- P75 = median × 1.30
- P90 = median × 1.65
Changelog
Section titled “Changelog”| Date | Change | Why |
|---|---|---|
| 2026-03-25 | Capture rate changed from density-tiered (3%/6%/12%) to logarithmic: 0.15/(1+0.3×ln(comp)) × density boost | Stepped tiers created cliff-edges; logarithmic is smoother and more empirically grounded |
| 2026-03-25 | totalAgents formula changed from fixed 1000 to pop/100, capped 500–5000 | Fixed 1000 agents in a 50K population zone vs 500K zone produced identical agent pools |
| 2026-03-25 | Delivery capacity multiplier reduced from 0.5 to 0.3 | 200 sqft kitchen × 0.5 = 100 orders/day was unrealistic; now 200 sqft → 60 orders |
| 2026-03-24 | Added physical capacity cap | 1000 agents × unconstrained rates produced physically impossible revenue |
| 2026-03-24 | Added density-tiered capture rates (3%/6%/12%) | Flat capture rate ignored competitive density differences |
Source
Section titled “Source”📖 Birkin, M. & Clarke, G. (2023). Retail Geography. Chapter 10: Microsimulation — EC-Sim Channel Model. Chapter 12: Computational advances in microsimulation.