Skip to content

Microsimulation

Microsimulation flips the modelling paradigm from aggregate to individual. Instead of working with zone-level averages, you synthesize a virtual population where each person has demographics, income, preferences, and location. Then you simulate their behaviour.

The aggregate models (Gravity, Huff, Regression) work with averages. But averages hide crucial variation:

  • A zone with median income contains both very low and very high earners
  • Age distribution matters: students, young professionals, and retirees eat differently
  • Household composition affects spending: families vs. singles vs. couples

Microsimulation models individual heterogeneity — the fact that people in the same zone behave very differently.

“Microsimulation is a technique that focuses on the characteristics and behaviour of individuals, rather than the groups that are used by conventional spatial interaction models.” — Birkin & Clarke, Ch. 10

DatasetWhat We UseLink
Census PopulationSynthetic population baseView →
Household IncomeAgent income distributionView →
Household SizeHousehold composition for agent generationView →
Age DistributionAge-stratified agent attributesView →
Employment by DistrictAgent employment status and commute patternsView →
Consumer Price IndexAgent spending budgets calibrated to real pricesView →
Restaurant ReceiptsValidate simulated spending against real receiptsView →
  1. Build a synthetic population — merge census demographics with market research data to create individual-level records
  2. Assign product ownership / preferences — what does each synthetic person consume? Based on their demographics + location
  3. Generate behaviour — where do they go? How often? How much do they spend?
  4. Simulate — run the population through spatial interaction, accounting for accessibility, competition, and individual preferences

Birkin & Clarke describe a 4-step microsimulation for financial services (EC-Sim), which translates directly to retail:

Step 1: Build micro-population sharing census demographics → Census data (age, income, household type) × geographic zones

Step 2: Add consumption patterns → Merge with market research / survey data

Step 3: Generate behaviour preferences → Not just demographics — also accessibility to services, which varies by location

Step 4: Simulate channel usage → Include physical provision (store locations, opening hours), brand, demographics

For each synthetic individual i in the target area:

P(visit_restaurant_j) = f(
distance(i, j), // walking time from home/office
cuisine_match(i, j), // does j serve what i likes?
price_match(i, j), // is j in i's budget?
attractiveness(j), // size, reviews, brand
time_of_day, // lunch crowd vs. dinner
competition_nearby(j) // alternatives within 200m
)

Running this for all synthetic individuals across all restaurants in the district produces a predicted visit count and revenue for any restaurant at a given address.

DataAvailabilityQuality
Census demographics✅ Census 2021, TPU levelExcellent
Household income distribution✅ Census 2021Good (banded)
Restaurant locations✅ FEHD licensesComplete
Consumer preferences⚠️ No public dataNeed survey or proxy
Actual spending patterns⚠️ No public dataNeed Octopus/credit card data
MicrosimulationABM
UnitSynthetic individualAutonomous agent
BehaviourRule-based from dataEmergent from interactions
InteractionsIndividual → environmentAgent ↔ agent ↔ environment
DynamicsStatic snapshot or step-wiseContinuous time evolution
Data needsHeavy (census + surveys)Lighter (rules + parameters)
Best forDemand estimationScenario testing

The LLM Agent Simulation uses LLM-powered agents (Claude Opus) that combine microsimulation’s individual-level detail with ABM’s emergent behaviour — each agent has a synthetic persona AND can reason about complex tradeoffs.

“In the late 1980s, two of the present authors developed a microsimulation approach using a synthetic sample of 50,000 households… programs were run overnight in batch mode on a mainframe computer costing about £1.5 million. In Chapter 10, we reported on an application using a sample of one million households, and can be run in a few seconds real time on a personal computer costing around £1,000.” — Birkin & Clarke, Ch. 12

In 2026, we can run microsimulation for all of Hong Kong (~2.7M households) on a laptop in minutes. The bottleneck is data, not compute.

Synthetic population: totalAgents = round(pop / 100), capped at min 500 and max 5000. Scales with actual catchment population rather than using a fixed agent pool.

Weekly meal rates (base, before modifiers):

  • Breakfast: 8% of agents dine out per day (×7 days)
  • Lunch: 25% per day × 5 workdays only
  • Dinner: 18% per day (×7 days)
  • Late night: 5% per day × 4 nights

Rates are further adjusted by price band, age18to64 fraction, and family household rate. targetMatch multiplier of 1.1 applies if targetCustomers is non-empty.

Capture rate (logarithmic, not density-tiered):

logShare = competitors ≤ 1 ? 0.15 : 0.15 / (1 + 0.3 × ln(competitors))
densityBoost = density > 15000 ? 0.9 : density > 5000 ? 1.0 : 1.2
captureRate = min(0.15, logShare × densityBoost)

Examples: 1 competitor → 15%, 10 competitors → ~8%, 30 competitors → ~6%, 80 competitors → ~5%, 200 competitors → ~4%.

Physical capacity cap:

  • Dine-in seats: floorArea / sqftPerSeat (12–25 sqft/seat depending on price band)
  • Delivery: floorArea × 0.3 orders/day (kitchen throughput)
  • Takeaway: seats × turns × 2
  • Turns: High-end 1.5, Premium 2.0, Mid 2.5, Budget 3.0

Revenue distribution (monthly = weeklyBase × 4.3 × captureRate, capped by capacity):

  • P10 = median × 0.55
  • P25 = median × 0.75
  • P50 = median (base estimate)
  • P75 = median × 1.30
  • P90 = median × 1.65
DateChangeWhy
2026-03-25Capture rate changed from density-tiered (3%/6%/12%) to logarithmic: 0.15/(1+0.3×ln(comp)) × density boostStepped tiers created cliff-edges; logarithmic is smoother and more empirically grounded
2026-03-25totalAgents formula changed from fixed 1000 to pop/100, capped 500–5000Fixed 1000 agents in a 50K population zone vs 500K zone produced identical agent pools
2026-03-25Delivery capacity multiplier reduced from 0.5 to 0.3200 sqft kitchen × 0.5 = 100 orders/day was unrealistic; now 200 sqft → 60 orders
2026-03-24Added physical capacity cap1000 agents × unconstrained rates produced physically impossible revenue
2026-03-24Added density-tiered capture rates (3%/6%/12%)Flat capture rate ignored competitive density differences

📖 Birkin, M. & Clarke, G. (2023). Retail Geography. Chapter 10: Microsimulation — EC-Sim Channel Model. Chapter 12: Computational advances in microsimulation.