---
title: "Measles: Building the He et al. Model"
draft: true
execute:
  echo: false
---

```{python}
#| output: false
import subprocess, io, os, sys, re
import polars as pl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from IPython.display import HTML

sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname("__file__"), "..")))
from _lib.cli import run_cli

GREY = "#888888"
mpl.rcParams.update({
    "figure.facecolor": "none", "axes.facecolor": "none",
    "savefig.facecolor": "none", "figure.dpi": 150, "font.size": 11,
    "text.color": GREY, "axes.labelcolor": GREY,
    "xtick.color": GREY, "ytick.color": GREY,
    "axes.edgecolor": GREY, "legend.edgecolor": GREY,
})

VDIR = "he2010"

def run_sim(model, params=None, seed=42, extra_args=None, cwd=VDIR):
    cmd = ["camdl", "simulate", model, "--seed", str(seed)]
    if params:
        if isinstance(params, dict):
            for k, v in params.items():
                cmd.extend(["--param", f"{k}={v}"])
        else:
            cmd.extend(["--params", params])
    if extra_args:
        cmd.extend(extra_args)
    result = subprocess.run(cmd, capture_output=True, text=True, check=True, cwd=cwd)
    lines = result.stdout.split("\n")
    tsv = "\n".join(l for l in lines if not l.startswith("#"))
    return pl.read_csv(io.StringIO(tsv), separator="\t")
```

The [previous chapters](../guide/fitting-data/report.html) fit an
SIR model to 14 days of a boarding-school influenza outbreak — a closed
population, two free parameters, a single epidemic wave. That was the
simplest interesting inference problem.

Measles in London is a different beast. The 1978 boarding school had 763
boys and lasted two weeks. London in the 1950s had 3.5 million people and
measles came back every year or two for decades. To model that, we need
machinery the boarding-school SIR doesn't have: births that refill the
susceptible pool, a latent period, seasonal forcing from the school
calendar, and an observation model that accounts for incomplete reporting.

This chapter builds the He et al. (2010) London measles SEIR step by step.
Each section adds one mechanism and shows what it does to the dynamics.
By the end, every line of the full model is recognizable.


## From outbreak to endemic: why demography matters

The boarding-school SIR is a **closed** population — no one enters or
leaves. The epidemic burns through the susceptible pool and dies out. That's
fine for a two-week flu. But measles in a city persists for decades because
new children are born susceptible.

Adding births and deaths to the SIR is the single biggest conceptual step
in this chapter: it turns an outbreak model into an **endemic** model.

```{python}
#| output: asis
print("```camdl")
with open(os.path.join(VDIR, "models/sir_endemic.camdl")) as f:
    print(f.read().rstrip())
print("```")
```

Three lines changed from the boarding-school SIR:

- `birth : --> S @ mu * N0` — new susceptibles enter at rate μ × N
- `death_S : S --> @ mu * S` — individuals leave each compartment at
  the same rate
- The population stays roughly constant: births ≈ deaths when both
  happen at rate μ.

The death rate μ = 0.02/yr (≈ 0.0000548/day) is tiny on any given day,
but over 50 years it completely replaces the population. For measles with
R₀ ≈ 15–20, the susceptible pool drains in weeks but refills over years —
and that interplay is what generates recurrent epidemics.

```{python}
#| label: fig-sir-endemic
#| fig-cap: "SIR with demography at two R₀ values over 50 years (N = 500,000). Susceptibles (blue) drain during each epidemic, then slowly refill via births. Higher R₀ (right) produces shorter inter-epidemic periods because the susceptible threshold for a new outbreak is reached sooner."
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 4), sharey=True)

for ax, R0, title in [(ax1, 8.0, "R₀ = 8"), (ax2, 16.0, "R₀ = 16")]:
    N0 = 500000
    gamma = 1/12.0  # 12-day infectious period
    mu = 0.02/365.25
    beta = R0 * (gamma + mu)
    traj = run_sim("models/sir_endemic.camdl",
                   params={"beta": beta, "gamma": gamma,
                           "mu": mu, "N0": N0},
                   seed=7)
    t_yr = traj["t"].to_numpy() / 365.25
    ax.plot(t_yr, traj["S"].to_numpy(), color="#3366cc", linewidth=0.5,
            alpha=0.8, label="S")
    ax.plot(t_yr, traj["I"].to_numpy(), color="#cc3333", linewidth=0.5,
            alpha=0.8, label="I")
    ax.set_xlabel("Years")
    ax.set_title(title, fontsize=11, color=GREY)
    ax.spines[["top", "right"]].set_visible(False)
    ax.legend(frameon=False, fontsize=9)

ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()
```

Compare this to the boarding-school SIR: one epidemic, done. Here,
susceptibles accumulate via births until the epidemic threshold is crossed,
the epidemic fires, and the cycle repeats. The inter-epidemic period
depends on R₀ — higher R₀ means fewer susceptibles needed to trigger an
outbreak, so epidemics recur faster.

This is already a useful model of endemic disease. But it's missing
features that matter for measles specifically.


## Adding a latent period: from SIR to SEIR

Measles has an incubation period of 8–13 days — you're infected but not yet
infectious. The SIR model doesn't capture this: infection and
infectiousness happen simultaneously. The **SEIR** model adds an
**Exposed** (E) compartment where individuals incubate before moving to I.

```{python}
#| output: asis
print("```camdl")
with open(os.path.join(VDIR, "models/seir_endemic.camdl")) as f:
    for line in f:
        if line.strip().startswith("#") and "SEIR" in line:
            print(line.rstrip())
        elif "compartments" in line or "latency" in line or "sigma" in line:
            print(line.rstrip())
print("```")
```

The new pieces: `compartments { S, E, I, R }`, a `sigma : rate` parameter
(1/σ = mean incubation period), and the transition `latency : E --> I @
sigma * E`. Everything else is identical to the SIR.

```{python}
#| label: fig-seir-vs-sir
#| fig-cap: "SEIR (left) vs SIR (right) with demography. Same R₀, same population. The latent period delays and broadens each epidemic wave — exposed individuals incubate for ~12 days before becoming infectious, which smooths the dynamics."
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 4), sharey=True)

N0 = 500000
gamma = 1/12.0
sigma = 1/12.0   # 12-day incubation
mu = 0.02/365.25
R0 = 12.0
beta = R0 * (gamma + mu)

# SEIR
traj_seir = run_sim("models/seir_endemic.camdl",
                    params={"beta": beta, "gamma": gamma, "sigma": sigma,
                            "mu": mu, "N0": N0},
                    seed=7)
t_yr = traj_seir["t"].to_numpy() / 365.25
ax1.plot(t_yr, traj_seir["I"].to_numpy(), color="#cc3333", linewidth=0.5, alpha=0.8)
ax1.plot(t_yr, traj_seir["E"].to_numpy(), color="#e67e22", linewidth=0.5, alpha=0.4, label="E")
ax1.set_title("SEIR (σ = 1/12 day⁻¹)", fontsize=11, color=GREY)
ax1.set_xlabel("Years")
ax1.set_ylabel("Count")
ax1.spines[["top", "right"]].set_visible(False)

# SIR
traj_sir = run_sim("models/sir_endemic.camdl",
                   params={"beta": beta, "gamma": gamma,
                           "mu": mu, "N0": N0},
                   seed=7)
ax2.plot(traj_sir["t"].to_numpy() / 365.25, traj_sir["I"].to_numpy(),
         color="#cc3333", linewidth=0.5, alpha=0.8)
ax2.set_title("SIR (no latent period)", fontsize=11, color=GREY)
ax2.set_xlabel("Years")
ax2.spines[["top", "right"]].set_visible(False)

plt.tight_layout()
plt.show()
```

The latent period acts as a low-pass filter on the dynamics: each epidemic
wave is broader and lower-peaked than the SIR equivalent, because the
exposed-to-infectious delay smears the surge over time. For inference, this
means σ and γ interact — a longer latent period paired with a shorter
infectious period can produce similar epidemic curves, creating a
compensation ridge in the likelihood surface (a theme we'll return to in
the fitting chapter).


## School-term forcing: where biennial epidemics come from

The endemic SEIR produces recurrent epidemics, but their timing is
irregular — determined by when the susceptible pool happens to cross the
epidemic threshold. Real measles in pre-vaccination London showed a
strikingly regular **biennial** pattern: large epidemics every other year,
with smaller waves in between.

The mechanism is **school-term forcing** (Fine & Clarkson 1982, Schenzle
1984). Children mix intensely during school terms and barely at all during
holidays. The transmission rate β(t) oscillates between a high value
(term-time) and a low value (holidays) with the school calendar.

```{python}
#| label: fig-school-forcing
#| fig-cap: "UK school calendar as a forcing function. The transmission rate scales by 1 + a during term (grey bands) and 1 - a × (277/88) during holidays, where a is the amplitude parameter. With a = 0.554 (the He et al. MLE), term-time transmission is ~50% higher than average."
days = np.arange(0, 366)
# UK school terms: days 7–100, 115–199, 252–300, 308–356
school = np.zeros_like(days, dtype=float)
for lo, hi in [(7,100), (115,199), (252,300), (308,356)]:
    school[(days >= lo) & (days <= hi)] = 1.0

amplitude = 0.554
seas = 1.0 - amplitude + amplitude * (1.0 + 0.2411/0.7589) * school

fig, ax = plt.subplots(figsize=(10, 3))
ax.fill_between(days, 0, 1, where=school > 0, alpha=0.08, color="#3366cc",
                transform=ax.get_xaxis_transform())
ax.plot(days, seas, color="#cc3333", linewidth=2)
ax.axhline(1.0, color=GREY, linewidth=0.5, linestyle=":")
ax.set_xlabel("Day of year")
ax.set_ylabel("β(t) / β̄")
ax.set_xlim(0, 365)
ax.spines[["top", "right"]].set_visible(False)

# Label holidays
for x, label in [(53, "Spring\nterm"), (157, "Summer\nterm"),
                  (276, "Autumn\nterm"), (332, "Winter\nterm")]:
    ax.text(x, 0.3, label, ha="center", fontsize=8, color="#3366cc", alpha=0.6)
for x, label in [(107, "Easter"), (225, "Summer"), (304, "Half\nterm")]:
    ax.text(x, 1.55, label, ha="center", fontsize=8, color=GREY, alpha=0.7)

plt.tight_layout()
plt.show()
```

In camdl, school-term forcing uses a `periodic` block in the `forcing {}`
section:

```camdl
forcing {
  school : periodic {
    period = 365.25 'days
    step   = 1 'days
    on     = [7:100, 115:199, 252:300, 308:356]
  }
}
```

The `school(t)` function returns 1 during term days and 0 during holidays.
The transmission rate then modulates as `β_base × seas` where
`seas = 1 - a + a × (1 + 0.2411/0.7589) × school(t)`, ensuring the
time-averaged forcing equals 1.

```{python}
#| label: fig-seir-seasonal
#| fig-cap: "SEIR with school-term forcing (amplitude = 0.554) over 30 years, N = 3,000,000. The interaction between seasonal forcing and susceptible depletion produces the characteristic biennial pattern: a large epidemic depletes susceptibles below the winter threshold, so the following year's outbreak is smaller, allowing susceptibles to accumulate for a large epidemic the year after."
N0 = 3000000
traj = run_sim("models/seir_seasonal.camdl",
               params={"R0": 17.0, "sigma": 1/12.0, "gamma": 1/12.0,
                        "mu": 0.02/365.25, "amplitude": 0.554, "N0": N0},
               seed=3)

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 6), sharex=True,
                                gridspec_kw={"height_ratios": [1, 2]})

t_yr = traj["t"].to_numpy() / 365.25
mask = t_yr < 30

ax1.plot(t_yr[mask], traj["S"].to_numpy()[mask] / N0 * 100,
         color="#3366cc", linewidth=0.8)
ax1.set_ylabel("Susceptible (%)")
ax1.spines[["top", "right"]].set_visible(False)

ax2.plot(t_yr[mask], traj["I"].to_numpy()[mask],
         color="#cc3333", linewidth=0.8)
ax2.set_xlabel("Years")
ax2.set_ylabel("Infectious (I)")
ax2.spines[["top", "right"]].set_visible(False)

plt.tight_layout()
plt.show()
```

The biennial pattern emerges from the **interaction** between two clocks:
the annual school calendar and the multi-year susceptible replenishment
cycle. A large epidemic in year 1 depletes susceptibles so far below the
threshold that the next winter's forcing can't trigger a full outbreak —
only a small wave. That leaves more susceptibles for year 3, producing a
large epidemic again. The period depends on R₀, the amplitude of forcing,
and the birth rate — and it's not always biennial. At different parameter
values the model produces annual, biennial, or chaotic dynamics, which is
part of what makes inference hard.


## The London measles data

Now we can see what we're actually trying to fit. The data are weekly
notified measles cases in London, 1944–1965 — 1096 observations over
21 years.

```{python}
#| label: fig-london-data
#| fig-cap: "Weekly measles notifications in London, 1944–1965. The biennial pattern is clear: large epidemics in odd years with smaller inter-epidemic waves. Peak weeks reach 4,000+ cases; troughs drop below 100. This is the dataset He et al. (2010) fitted their SEIR model to."
cases = pl.read_csv(os.path.join(VDIR, "data/he2010_london_cases.tsv"),
                    separator="\t")

fig, ax = plt.subplots(figsize=(12, 4))
t_yr = cases["time"].to_numpy() / 365.25 + 1944.0  # t=0 is 1944-01-07
ax.fill_between(t_yr, 0, cases["cases"].to_numpy(),
                color="#cc3333", alpha=0.3, linewidth=0)
ax.plot(t_yr, cases["cases"].to_numpy(), color="#cc3333", linewidth=0.5)
ax.set_xlabel("Year")
ax.set_ylabel("Weekly cases")
ax.spines[["top", "right"]].set_visible(False)
plt.tight_layout()
plt.show()
```


## Can the simple SEIR explain this data?

Before adding more machinery, it's worth asking: how far does the seasonal
SEIR already get? We can simulate the seasonal model at roughly
measles-like parameters (R₀ ≈ 17, 12-day latent and infectious periods,
amplitude ≈ 0.55) and overlay on the observed data. The model doesn't have
reporting (ρ), so we compare the shape and timing rather than the absolute
scale.

```{python}
#| label: fig-seir-vs-data
#| fig-cap: "Seasonal SEIR simulation (blue, right axis) overlaid on London measles data (red, left axis). The model produces biennial epidemics with roughly the right timing, but there are clear gaps: the simulated peaks are too regular, the amplitude doesn't vary across decades, and the absolute scale requires a reporting fraction to match. These gaps motivate the full He et al. model."
N0 = 3500000
traj = run_sim("models/seir_seasonal.camdl",
               params={"R0": 17.0, "sigma": 1/12.0, "gamma": 1/12.0,
                        "mu": 0.02/365.25, "amplitude": 0.554, "N0": N0},
               seed=5)

# Aggregate model I to weekly for comparison
t_days = traj["t"].to_numpy()
I_vals = traj["I"].to_numpy()

fig, ax1 = plt.subplots(figsize=(12, 5))
ax2 = ax1.twinx()

# Data
t_data_yr = cases["time"].to_numpy() / 365.25 + 1944.0
ax1.fill_between(t_data_yr, 0, cases["cases"].to_numpy(),
                 color="#cc3333", alpha=0.2, linewidth=0)
ax1.plot(t_data_yr, cases["cases"].to_numpy(), color="#cc3333",
         linewidth=0.5, alpha=0.7, label="Observed cases")
ax1.set_ylabel("Weekly reported cases", color="#cc3333")
ax1.tick_params(axis="y", colors="#cc3333")

# Model — shift to align with data years (model starts at t=0, data at 1944)
t_model_yr = t_days / 365.25 + 1944.0
ax2.plot(t_model_yr, I_vals, color="#3366cc", linewidth=0.5, alpha=0.6,
         label="Simulated I(t)")
ax2.set_ylabel("Simulated infectious (I)", color="#3366cc")
ax2.tick_params(axis="y", colors="#3366cc")

ax1.set_xlabel("Year")
ax1.set_xlim(1944, 1965)
ax1.spines["top"].set_visible(False)
ax2.spines["top"].set_visible(False)

# Combined legend
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, frameon=False, fontsize=9,
           loc="upper right")

plt.tight_layout()
plt.show()
```

The seasonal SEIR gets the basic biennial pattern right — epidemics recur
roughly every two years, peaking in winter. But the comparison reveals
exactly what's missing:

- **Scale.** The model simulates true infections; the data are *reported*
  cases. He et al. estimate ρ ≈ 0.49, meaning roughly half of infections
  go unreported. Without a reporting fraction, we can't compare counts
  directly.
- **Amplitude variation.** The model's epidemic peaks are roughly constant;
  the data shows a declining trend (the 1940s epidemics are much larger
  than the 1960s). This comes from London's changing population and birth
  rate — time-varying covariates the simple model doesn't have.
- **Stochastic variation.** Even within the biennial cycle, real epidemics
  vary in size and timing in ways that a single stochastic trajectory
  can't match consistently. The `overdispersed()` primitive adds
  day-to-day noise in the force of infection.
- **Irregular inter-epidemic troughs.** The data shows non-zero cases even
  between epidemics. The importation rate ι (a few cases/year from
  outside London) sustains low-level transmission during troughs.

The next sections add these features to reach the full He et al. model.


## Fitting the seasonal SEIR: prior predictive check

Before fitting, we should check what the model predicts *before seeing
data*. A prior predictive check draws parameters from the prior and
simulates — if the prior envelope doesn't overlap the data at all, the
fit will fail before it starts.

We use measles-informed ranges: R₀ ∈ [10, 60], amplitude ∈ [0.1, 0.9],
ρ ∈ [0.1, 0.9], s₀ ∈ [0.01, 0.12]. These aren't arbitrary — they
reflect the literature on measles dynamics in pre-vaccination populations.

```{python}
#| label: fig-prior-predictive
#| fig-cap: "Prior predictive check: 40 simulations at uniformly drawn parameters from the prior ranges, with weekly incidence scaled by a random ρ. Grey lines are individual trajectories; the red shaded area is the observed data. The prior envelope covers the data — the model can, in principle, generate epidemics of the right magnitude and timing."
fig, ax = plt.subplots(figsize=(12, 5))

# Plot data first
t_data_yr = cases["time"].to_numpy() / 365.25 + 1944.0
ax.fill_between(t_data_yr, 0, cases["cases"].to_numpy(),
                color="#cc3333", alpha=0.2, linewidth=0)
ax.plot(t_data_yr, cases["cases"].to_numpy(), color="#cc3333",
        linewidth=0.8, alpha=0.6, label="Observed cases", zorder=3)

rng = np.random.default_rng(42)
for i in range(40):
    R0_draw = rng.uniform(10, 60)
    amp_draw = rng.uniform(0.1, 0.9)
    rho_draw = rng.uniform(0.1, 0.9)
    s0_draw = rng.uniform(0.01, 0.12)
    try:
        traj = run_sim("models/seir_seasonal.camdl",
                       params={"R0": R0_draw, "sigma": 0.0791,
                               "gamma": 0.0832, "mu": 0.02/365.25,
                               "amplitude": amp_draw, "N0": 3500000},
                       seed=i+1)
        t_yr = traj["t"].to_numpy() / 365.25 + 1944.0
        # Weekly incidence proxy: difference in R scaled by rho
        R_vals = traj["R"].to_numpy()
        weekly_inc = np.diff(R_vals[::7]) * rho_draw
        t_weekly = t_yr[::7][1:]
        if len(t_weekly) > 0:
            ax.plot(t_weekly, weekly_inc, color="#555555", linewidth=0.5,
                    alpha=0.45, zorder=1)
    except Exception:
        pass

ax.set_xlabel("Year")
ax.set_ylabel("Weekly cases")
ax.set_xlim(1944, 1965)
ax.set_ylim(0, 15000)
ax.legend(frameon=False, fontsize=9)
ax.spines[["top", "right"]].set_visible(False)
plt.tight_layout()
plt.show()
```


## Fitting with IF2

We add a NegBin observation model to the seasonal SEIR (weekly reported
recoveries, reporting fraction ρ, overdispersion k) and fit to the
London data with IF2. Five parameters are estimated — R₀, amplitude, ρ,
k, and the initial susceptible fraction s₀ — with σ, γ, μ, and N₀ fixed
at literature values.

```{python}
#| echo: false
#| output: asis
with open(os.path.join(VDIR, "fit_seir_seasonal.toml")) as f:
    print(f"```toml\n{f.read().strip()}\n```")
```

```{python}
#| echo: false
display(run_cli(
    f"cd {VDIR} && camdl fit run fit_seir_seasonal.toml --seed 42",
    echo="camdl fit run fit_seir_seasonal.toml --seed 42",
    cwd=VDIR, max_lines=25
))
```


### Scout convergence

```{python}
#| label: fig-scout-traces
#| fig-cap: "IF2 scout traces for the seasonal SEIR on London measles. 16 chains × 1000 particles × 150 iterations. R₀ and k are multimodal (Â > 10) — chains occupy different basins. Amplitude converges; ρ is marginal (Â ≈ 1.3). The simple seasonal SEIR without importation or process noise has severe identifiability problems on 21 years of data."
import glob as _glob

scout_dir = sorted(_glob.glob(os.path.join(VDIR, "results/fits/fit_seir_seasonal-*/real/fit_42/scout")))[-1]
traces = []
for cdir in sorted(_glob.glob(os.path.join(scout_dir, "chain_*"))):
    if not os.path.isdir(cdir):
        continue
    # Skip comment lines before reading
    trace_path = os.path.join(cdir, "parameter_traces.tsv")
    with open(trace_path) as _f:
        tsv_lines = [l for l in _f if not l.startswith("#")]
    df = pl.read_csv(io.StringIO("".join(tsv_lines)),
                     separator="\t", infer_schema_length=10000)
    cid = int(os.path.basename(cdir).split("_")[1])
    df = df.with_columns(pl.lit(cid).alias("chain"))
    traces.append(df)
traces_df = pl.concat(traces)

params = ["R0", "amplitude", "rho", "k", "s0"]
fig, axes = plt.subplots(1, 5, figsize=(16, 3.5))
for ax, param in zip(axes, params):
    for chain in traces_df["chain"].unique().sort().to_list():
        sub = traces_df.filter(pl.col("chain") == chain)
        ax.plot(sub["iteration"], sub[param], alpha=0.35, linewidth=0.4,
                color="#3366cc")
    ax.set_ylabel(param)
    ax.set_xlabel("Iteration")
    ax.spines[["top", "right"]].set_visible(False)
plt.suptitle("Scout traces — seasonal SEIR on London measles", fontsize=11,
             color=GREY, y=1.02)
plt.tight_layout()
plt.show()
```

```{python}
#| echo: false
#| output: asis
# Read Â from fit_state.toml
import tomllib
state_path = os.path.join(scout_dir, "fit_state.toml")
if os.path.exists(state_path):
    with open(state_path, "rb") as f:
        state = tomllib.load(f)
    agreements = state.get("tail_chain_agreement", {})
    best_ll = state.get("best_loglik", None)
    if best_ll is not None:
        print(f"**Best scout log-likelihood:** {best_ll:.1f} (chain {state.get('best_chain', '?')})")
        print()
    print("| parameter | Â | status |")
    print("|---|---|---|")
    for param in ["R0", "amplitude", "rho", "k", "s0"]:
        agreement = agreements.get(param, 0)
        if agreement == 0:
            continue
        color = "#27ae60" if agreement < 1.05 else ("#e67e22" if agreement < 1.10 else "#c0392b")
        status = "✓" if agreement < 1.05 else ("~" if agreement < 1.10 else "✗")
        print(f"| `{param}` | "
              f"<span style='color:{color}'>{agreement:.3f}</span> | {status} |")
```


### What the diagnostics tell us

The scout found a best log-likelihood of about −11,300 — the model can
explain some of the data, but not well. The convergence diagnostics are
worse: R₀ and k have Â values above 10, meaning chains are exploring
completely different basins. This is the seasonal SEIR telling us it's
not the right model for this data.

The specific failures point to specific gaps:

- **R₀ multimodality** — without importation (ι), inter-epidemic troughs
  go to exactly zero infections, and the model can match the data equally
  well at very different R₀ values by adjusting the timing of extinction
  and re-seeding.
- **k wandering** — the NegBin overdispersion parameter absorbs variance
  that should come from process noise (`overdispersed()`) and time-varying
  covariates. Without those, k has to do too much work.
- **s₀ at lower bound** — the initial susceptible fraction is pinned at
  1%, suggesting the model needs a different initialization strategy
  (warm-up period or covariate-informed initial conditions).

These are not fitting failures — they're the model telling us what it
needs. Each gap maps to a specific feature in the full He et al. model.


## What's still needed: the full He et al. model

The seasonal SEIR gets the biennial pattern right but can't fit the data
quantitatively. The remaining mechanisms — time-varying covariates,
importation, process noise, cohort births, and the He et al. observation
model — address the specific failures we just diagnosed:

**Time-varying covariates.** `interpolated` forcing for pop(t) and
birthrate(t) replaces the fixed N₀ and μ:

```camdl
forcing {
  pop : interpolated {
    data = "data/he2010_london_covariates.tsv"
    time_col = t; value_col = pop; method = "linear"
  }
  birthrate : interpolated {
    data = "data/he2010_london_covariates.tsv"
    time_col = t; value_col = birthrate; method = "linear"
  }
}
```

**Importation** (ι ≈ 3 cases/year from outside London) sustains low-level
transmission during inter-epidemic troughs — without it, the model goes
to zero and can only restart via stochastic fluctuation.

**Process noise** via `overdispersed()` adds Gamma-distributed noise to
the force of infection. This is the He et al. "environmental stochasticity"
that we saw outperform NegBin observation noise in the
[model comparison chapter](../guide/fitting-data/model_comparison.html).

**Cohort births** via `events {}` — children enter school once per year,
creating an annual pulse of new susceptibles.

**The observation model** — weekly reported recoveries with reporting
fraction ρ and overdispersion ψ (He et al. eq. 6).

```{python}
#| output: asis
print("The complete `he2010_london.camdl`:")
print()
print("```camdl")
with open(os.path.join(VDIR, "models/he2010_london.camdl")) as f:
    text = f.read()
    lines = text.split("\n")
    start = next(i for i, l in enumerate(lines) if l.strip() and not l.strip().startswith("#"))
    print("\n".join(lines[start:]).rstrip())
print("```")
```

Twelve parameters, time-varying forcing from three sources, and a
non-trivial observation model. The fitting chapter will show what
happens when IF2 meets this full model — which parameters converge,
which form ridges, and what the diagnostics look like on a real-world
inference problem.