Skip to content

Multiple Regression Model

The multiple regression model is the most straightforward way to predict a store’s revenue from measurable site characteristics. It’s a statistical workhorse: you feed it data from existing stores, and it tells you what drives sales.

Y = a + b₁X₁ + b₂X₂ + b₃X₃ + b₄X₄

Where:

  • Y = predicted annual turnover
  • X₁ = store size (sq ft)
  • X₂ = competition index (nearby competing stores)
  • X₃ = market size (population within catchment)
  • X₄ = affluence index (household income)
  • a, b₁…b₄ = coefficients fitted from existing store data

“Multiple regression is used to combine different attributes into the same regression equation… The model is usually built up sequentially (‘stepwise’), starting with the most important variable.” — Birkin & Clarke, Retail Geography, Ch. 7

  1. Collect data from existing stores: turnover, size, competition, market size, affluence
  2. Run stepwise regression — each variable enters the model in order of explanatory power
  3. Read the coefficients — they tell you which factors matter most
  4. Apply to new sites — plug in the new location’s attributes, get a turnover prediction

Real Example: Birkin & Clarke’s Store Network

Section titled “Real Example: Birkin & Clarke’s Store Network”

From the book (Table 7.3), a UK retailer’s regression output:

VariableSlopeContribution to Variance
Market size5.80.33
Store size1250.20
Competition116,4480.18
Income17,9560.09
Total R²0.80

Market size alone explains 33% of turnover variation. Store size adds 20%. Together with competition and income, the model explains 80% of all variance.

For our Sheung Wan case study, the regression inputs would be:

VariableData SourceValue
Market size (X₃)Census population within 500m~28,000 residents
Competition (X₂)FEHD restaurant licenses in C&W208 licensed restaurants
Affluence (X₄)Median household incomeHK$30,000–40,000/mo
Store size (X₁)Site-specificDepends on lease

Strengths:

  • Simple, transparent, explainable to stakeholders
  • Quick to compute — runs in milliseconds
  • Good for portfolio benchmarking (which stores underperform their attributes?)

Limitations:

  • Requires training data (actual revenue from comparable stores)
  • Assumes linear relationships (diminishing returns not captured)
  • Doesn’t model consumer choice — treats each site independently
  • Can’t simulate “what-if” scenarios (new competitor, road closure)

The regression model is best as a first filter — screening many potential sites down to a shortlist. For the shortlist, you’d graduate to the Spatial Interaction Model or Agent Simulation to model actual consumer behaviour.

“Two important applications of the model are performance assessment and evaluation of potential.” — Birkin & Clarke, Ch. 7

📖 Birkin, M. & Clarke, G. (2023). Retail Geography. Chapter 7: Store Performance Modelling.