Microsimulation

Microsimulation flips the modelling paradigm from aggregate to individual. Instead of working with zone-level averages, you synthesize a virtual population where each person has demographics, income, preferences, and location. Then you simulate their behaviour.

Why Microsimulation?

The aggregate models (Gravity, Huff, Regression) work with averages. But averages hide crucial variation:

A zone with median income contains both very low and very high earners
Age distribution matters: students, young professionals, and retirees eat differently
Household composition affects spending: families vs. singles vs. couples

Microsimulation models individual heterogeneity — the fact that people in the same zone behave very differently.

“Microsimulation is a technique that focuses on the characteristics and behaviour of individuals, rather than the groups that are used by conventional spatial interaction models.” — Birkin & Clarke, Ch. 10

Data Requirements

Dataset	What We Use	Link
Census Population	Synthetic population base	View →
Household Income	Agent income distribution	View →
Household Size	Household composition for agent generation	View →
Age Distribution	Age-stratified agent attributes	View →
Employment by District	Agent employment status and commute patterns	View →
Consumer Price Index	Agent spending budgets calibrated to real prices	View →
Restaurant Receipts	Validate simulated spending against real receipts	View →

The Pipeline

Build a synthetic population — merge census demographics with market research data to create individual-level records
Assign product ownership / preferences — what does each synthetic person consume? Based on their demographics + location
Generate behaviour — where do they go? How often? How much do they spend?
Simulate — run the population through spatial interaction, accounting for accessibility, competition, and individual preferences

Technical Structure: EC-Sim Example

Birkin & Clarke describe a 4-step microsimulation for financial services (EC-Sim), which translates directly to retail:

Step 1: Build micro-population sharing census demographics → Census data (age, income, household type) × geographic zones

Step 2: Add consumption patterns → Merge with market research / survey data

Step 3: Generate behaviour preferences → Not just demographics — also accessibility to services, which varies by location

Step 4: Simulate channel usage → Include physical provision (store locations, opening hours), brand, demographics

Simulation Logic

For each synthetic individual i in the target area:

P(visit_restaurant_j) = f(
  distance(i, j),           // walking time from home/office
  cuisine_match(i, j),      // does j serve what i likes?
  price_match(i, j),        // is j in i's budget?
  attractiveness(j),        // size, reviews, brand
  time_of_day,              // lunch crowd vs. dinner
  competition_nearby(j)     // alternatives within 200m
)

Running this for all synthetic individuals across all restaurants in the district produces a predicted visit count and revenue for any restaurant at a given address.

Data Requirements for HK

Data	Availability	Quality
Census demographics	✅ Census 2021, TPU level	Excellent
Household income distribution	✅ Census 2021	Good (banded)
Restaurant locations	✅ FEHD licenses	Complete
Consumer preferences	⚠️ No public data	Need survey or proxy
Actual spending patterns	⚠️ No public data	Need Octopus/credit card data

Microsimulation vs. Agent-Based Modelling

	Microsimulation	ABM
Unit	Synthetic individual	Autonomous agent
Behaviour	Rule-based from data	Emergent from interactions
Interactions	Individual → environment	Agent ↔ agent ↔ environment
Dynamics	Static snapshot or step-wise	Continuous time evolution
Data needs	Heavy (census + surveys)	Lighter (rules + parameters)
Best for	Demand estimation	Scenario testing

The LLM Agent Simulation uses LLM-powered agents (Claude Opus) that combine microsimulation’s individual-level detail with ABM’s emergent behaviour — each agent has a synthetic persona AND can reason about complex tradeoffs.

Computational Reality

“In the late 1980s, two of the present authors developed a microsimulation approach using a synthetic sample of 50,000 households… programs were run overnight in batch mode on a mainframe computer costing about £1.5 million. In Chapter 10, we reported on an application using a sample of one million households, and can be run in a few seconds real time on a personal computer costing around £1,000.” — Birkin & Clarke, Ch. 12

In 2026, we can run microsimulation for all of Hong Kong (~2.7M households) on a laptop in minutes. The bottleneck is data, not compute.

Implementation Notes

Current Implementation (2026-03-25)

Synthetic population: totalAgents = round(pop / 100), capped at min 500 and max 5000. Scales with actual catchment population rather than using a fixed agent pool.

Weekly meal rates (base, before modifiers):

Breakfast: 8% of agents dine out per day (×7 days)
Lunch: 25% per day × 5 workdays only
Dinner: 18% per day (×7 days)
Late night: 5% per day × 4 nights

Rates are further adjusted by price band, age18to64 fraction, and family household rate. targetMatch multiplier of 1.1 applies if targetCustomers is non-empty.

Capture rate (logarithmic, not density-tiered):

logShare = competitors ≤ 1 ? 0.15 : 0.15 / (1 + 0.3 × ln(competitors))
densityBoost = density > 15000 ? 0.9 : density > 5000 ? 1.0 : 1.2
captureRate = min(0.15, logShare × densityBoost)

Examples: 1 competitor → 15%, 10 competitors → ~8%, 30 competitors → ~6%, 80 competitors → ~5%, 200 competitors → ~4%.

Physical capacity cap:

Dine-in seats: floorArea / sqftPerSeat (12–25 sqft/seat depending on price band)
Delivery: floorArea × 0.3 orders/day (kitchen throughput)
Takeaway: seats × turns × 2
Turns: High-end 1.5, Premium 2.0, Mid 2.5, Budget 3.0

Revenue distribution (monthly = weeklyBase × 4.3 × captureRate, capped by capacity):

P10 = median × 0.55
P25 = median × 0.75
P50 = median (base estimate)
P75 = median × 1.30
P90 = median × 1.65

Changelog

Date	Change	Why
2026-03-25	Capture rate changed from density-tiered (3%/6%/12%) to logarithmic: `0.15/(1+0.3×ln(comp))` × density boost	Stepped tiers created cliff-edges; logarithmic is smoother and more empirically grounded
2026-03-25	`totalAgents` formula changed from fixed 1000 to `pop/100`, capped 500–5000	Fixed 1000 agents in a 50K population zone vs 500K zone produced identical agent pools
2026-03-25	Delivery capacity multiplier reduced from 0.5 to 0.3	200 sqft kitchen × 0.5 = 100 orders/day was unrealistic; now 200 sqft → 60 orders
2026-03-24	Added physical capacity cap	1000 agents × unconstrained rates produced physically impossible revenue
2026-03-24	Added density-tiered capture rates (3%/6%/12%)	Flat capture rate ignored competitive density differences

Source

📖 Birkin, M. & Clarke, G. (2023). Retail Geography. Chapter 10: Microsimulation — EC-Sim Channel Model. Chapter 12: Computational advances in microsimulation.