---
title: "Fitting to Data"
freeze: true
---

```{python}
#| echo: false
#| output: false
import subprocess, io, os, sys, tempfile, re, json, glob
import polars as pl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from IPython.display import Markdown, HTML

GREY = "#888888"
mpl.rcParams.update({
    "figure.facecolor": "none",
    "axes.facecolor": "none",
    "savefig.facecolor": "none",
    "figure.dpi": 150,
    "font.size": 11,
    "text.color": GREY,
    "axes.labelcolor": GREY,
    "xtick.color": GREY,
    "ytick.color": GREY,
    "axes.edgecolor": GREY,
    "legend.edgecolor": GREY,
})

sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname("__file__"), "..")))
from _lib.cli import run_cli

FIT_DIR = "_archive/fitting_examples"

def run_sim(model, params_file, seed=42, scenario="baseline", extra_args=None, cwd=FIT_DIR):
    cmd = ["camdl", "simulate", model, "--params", params_file,
           "--scenario", scenario, "--seed", str(seed)]
    if extra_args:
        cmd.extend(extra_args)
    result = subprocess.run(cmd, capture_output=True, text=True, check=True, cwd=cwd)
    lines = result.stdout.split("\n")
    tsv = "\n".join(l for l in lines if not l.startswith("#"))
    return pl.read_csv(io.StringIO(tsv), separator="\t")

def load_mle(path):
    params = {}
    with open(path) as f:
        for line in f:
            if line.startswith("#") or "=" not in line:
                continue
            k, v = line.strip().split(" = ")
            params[k.strip()] = float(v.strip())
    return params

def write_params(params, path):
    with open(path, "w") as f:
        for k, v in params.items():
            f.write(f"{k} = {v}\n")

def load_traces(results_dir):
    traces = []
    for i, f in enumerate(sorted(glob.glob(os.path.join(results_dir, "chain_*/parameter_traces.tsv")))):
        df = pl.read_csv(f, separator="\t", comment_prefix="#", infer_schema_length=10000)
        df = df.with_columns(pl.lit(i + 1).alias("chain"))
        traces.append(df)
    return pl.concat(traces)

# ── Locators for the content-addressable fit tree (post 2026-04-19) ─────
#
# Fit output directories are `output/fits/<stem>-<hash[:8]>/` where the
# hash changes whenever the fit.toml, model IR, or data files change.
# Book snippets can't hard-code the hash, so they resolve the directory
# by globbing on the stem and picking the most-recent match.
def find_fit_root(stem, root=None):
    root = root or os.path.join(FIT_DIR, "results")
    cands = sorted(glob.glob(os.path.join(root, "fits", f"{stem}-*")),
                   key=os.path.getmtime, reverse=True)
    if not cands:
        raise FileNotFoundError(f"no fit {stem!r} under {root!r}")
    return cands[0]

def find_fit_stage(stem, stage, seed=None, root=None):
    fit_root = find_fit_root(stem, root=root)
    seed_glob = f"fit_{seed}" if seed else "fit_*"
    cands = sorted(glob.glob(os.path.join(fit_root, "real", seed_glob, stage)),
                   key=os.path.getmtime, reverse=True)
    if not cands:
        raise FileNotFoundError(f"no stage {stage!r} under {fit_root!r}")
    return cands[0]

# Load data
data = pl.read_csv(os.path.join(FIT_DIR, "data/boarding_school_sir.tsv"), separator="\t")
```

In the [previous chapters](experiments.qmd) we built a model, explored its
uncertainty, and tested its behavior under different scenarios. Now we turn
to the central question: **can the data constrain the model's parameters?**

This chapter introduces `camdl fit`, the inference pipeline that finds
parameter values consistent with observed data. We fit an SIR model to
the 1978 boarding school influenza outbreak and discover how much 14
daily observations can — and can't — tell us about the underlying
epidemic.


## The problem: latent states and intractable likelihoods

A compartmental model has **latent states** — the true number of susceptible,
infectious, and recovered individuals at each time point — that we never
observe directly. We observe noisy, incomplete proxies: daily bed counts,
weekly case reports. To find the parameters that best explain the data, we
need the likelihood $L(\theta \mid y_{1:T})$ — the probability of seeing
these observations given the parameters.

For a stochastic compartmental model, this likelihood is an integral over
all possible latent epidemic trajectories:

$$
L(\theta \mid y_{1:T}) = \int P(y_{1:T} \mid x_{0:T}, \theta) \; P(x_{0:T} \mid \theta) \; dx_{0:T}
$$

This integral is intractable — the space of possible trajectories is
enormous. The **particle filter** approximates it by simulating many
trajectories in parallel and weighting them by how well they explain the
data. **Iterated filtering (IF2)** [Ionides et al. 2015](https://doi.org/10.1073/pnas.1410597112)
builds on this: it perturbs the parameters across the particles,
runs the particle filter, and then cools the perturbations over
iterations to converge on the maximum likelihood estimate (MLE). Think
simulated annealing on an approximate likelihood surface. This is the
same algorithm as pomp's `mif2()`.


## The fit.toml config

A fit configuration separates three concerns: what to estimate, what to
fix, and how to run the inference. Here is the config for our boarding
school SIR:

```{python}
#| echo: false
#| output: asis
with open(os.path.join(FIT_DIR, "fit_sir.toml")) as f:
    print(f"```toml\n{f.read().strip()}\n```")
```

The `[estimate]` section lists the free parameters with their search
bounds: the transmission rate β, recovery rate γ, and observation
noise k. The `[fixed]` section pins the population size and initial
infected count. Note there is no reporting fraction — every infectious
boy is in bed, so the observation model is
`neg_binomial(mean = prevalence(I), r = k)` directly.


## Validating the pipeline: simulation-based calibration

Before trusting results on real data, we check: **can the pipeline
recover known parameters from synthetic data?** This is
simulation-based calibration
([Talts et al. 2018](https://arxiv.org/abs/1804.06788)) — generate
many datasets at known parameters, fit each one, and check that the
MLEs are centered on truth.

We simulate 30 synthetic datasets from known parameters, fit each with
the same IF2 configuration, and compare the recovered parameters to the
generating values:

```{python}
#| code-fold: true
#| code-summary: "SBC analysis code"
#| label: fig-sbc
#| fig-cap: "Simulation-based calibration for the boarding school SIR. Each dot is the MLE from one synthetic dataset (30 total). Dashed red lines mark the true parameter values. β is systematically overestimated from 14 daily observations — the particle filter likelihood is noisy enough that higher-β parameter regions appear more favorable."
sbc = pl.read_csv(os.path.join(FIT_DIR, "results/sbc/summary.tsv"),
                  separator="\t", skip_rows=1, has_header=False,
                  new_columns=["dseed", "beta", "gamma", "k", "ll"])

true_vals = {"beta": 2.1961, "gamma": 0.5782, "k": 50.35}
params = ["beta", "gamma", "k"]

fig, axes = plt.subplots(1, 3, figsize=(11, 3.5))
for ax, param in zip(axes, params):
    vals = sbc[param].to_numpy()
    ax.scatter(range(len(vals)), vals, s=15, color="#3366cc", alpha=0.6)
    ax.axhline(true_vals[param], color="#cc3333", linestyle="--", linewidth=1.5,
               label=f"True = {true_vals[param]:.2f}")
    ax.axhline(vals.mean(), color="#3366cc", linestyle=":", linewidth=1,
               label=f"Mean = {vals.mean():.2f}")
    ax.set_xlabel("Synthetic dataset")
    ax.set_ylabel(param)
    ax.legend(frameon=False, fontsize=7)
    ax.spines[["top", "right"]].set_visible(False)
plt.tight_layout()
plt.show()
```

The bias propagates to R₀ = β/γ — the quantity decision-makers
actually need:

```{python}
#| code-fold: true
#| code-summary: "Plotting code"
#| label: fig-sbc-r0
#| fig-cap: "R₀ = β/γ across 30 synthetic datasets. The MLE overestimates R₀ by ~39% on average from 14 daily observations (mean 5.3 vs true 3.8)."
R0s = (sbc["beta"] / sbc["gamma"]).to_numpy()
true_R0 = 2.1961 / 0.5782

fig, ax = plt.subplots(figsize=(7, 3.5))
ax.scatter(range(len(R0s)), R0s, s=20, color="#3366cc", alpha=0.6)
ax.axhline(true_R0, color="#cc3333", linestyle="--", linewidth=1.5,
           label=f"True R₀ = {true_R0:.1f}")
ax.axhline(R0s.mean(), color="#3366cc", linestyle=":", linewidth=1,
           label=f"Mean = {R0s.mean():.1f} (+{100*(R0s.mean()-true_R0)/true_R0:.0f}%)")
ax.set_xlabel("Synthetic dataset")
ax.set_ylabel("R₀ = β/γ")
ax.legend(frameon=False, fontsize=9)
ax.spines[["top", "right"]].set_visible(False)
plt.tight_layout()
plt.show()
```

### Does more data fix the bias?

To confirm this is a small-data problem, we repeat the SBC at four
observation frequencies — same 14-day epidemic, same parameters, but
observing every 1 day (14 points), every 0.5 days (28 points), every
0.25 days (56 points), and every 0.1 days (140 points):

```{python}
#| code-fold: true
#| code-summary: "R₀ bias by observation density"
#| label: fig-r0-by-freq
#| fig-cap: "R₀ estimation bias as a function of observation density. With 14 daily observations, R₀ is overestimated by +39%. At 28 observations (twice-daily), the bias drops to -2%. The basic reproduction number requires sub-daily observation data for reliable estimation from a stochastic SIR on a 14-day outbreak."
freqs = [1.0, 0.5, 0.25, 0.1]
freq_labels = ["14\n(1/day)", "28\n(2/day)", "56\n(4/day)", "140\n(10/day)"]
true_R0_val = 2.1961 / 0.5782

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Left: R0 scatter
for i, (freq, label) in enumerate(zip(freqs, freq_labels)):
    if freq == 1.0:
        path = os.path.join(FIT_DIR, "results/sbc/summary.tsv")
        cols = ["dseed", "beta", "gamma", "k", "ll"]
    else:
        path = os.path.join(FIT_DIR, f"results/sbc_freq_{freq}.tsv")
        cols = ["dseed", "beta", "gamma", "k", "ll"]
    if os.path.exists(path):
        df = pl.read_csv(path, separator="\t", skip_rows=1, has_header=False,
                         new_columns=cols)
        if len(df) > 0:
            R0s = (df["beta"] / df["gamma"]).to_numpy()
            jitter = np.random.default_rng(42).uniform(-0.15, 0.15, len(R0s))
            ax1.scatter(np.full_like(R0s, i) + jitter, R0s,
                      s=15, alpha=0.5, color="#3366cc", zorder=3)
            ax1.plot([i - 0.3, i + 0.3], [R0s.mean(), R0s.mean()],
                   color="#3366cc", linewidth=2, zorder=4)
            bias = 100 * (R0s.mean() - true_R0_val) / true_R0_val
            ax1.text(i, max(R0s) + 0.3, f"{bias:+.0f}%", ha="center",
                    fontsize=9, color=GREY)

ax1.axhline(true_R0_val, color="#cc3333", linestyle="--", linewidth=1.5,
           label=f"True R₀ = {true_R0_val:.1f}")
ax1.set_xticks(range(len(freq_labels)))
ax1.set_xticklabels(freq_labels)
ax1.set_xlabel("Observations (over 14 days)")
ax1.set_ylabel("R₀ = β/γ")
ax1.legend(frameon=False, fontsize=9)
ax1.spines[["top", "right"]].set_visible(False)

# Right: beta scatter
for i, (freq, label) in enumerate(zip(freqs, freq_labels)):
    if freq == 1.0:
        path = os.path.join(FIT_DIR, "results/sbc/summary.tsv")
    else:
        path = os.path.join(FIT_DIR, f"results/sbc_freq_{freq}.tsv")
    cols = ["dseed", "beta", "gamma", "k", "ll"]
    if os.path.exists(path):
        df = pl.read_csv(path, separator="\t", skip_rows=1, has_header=False,
                         new_columns=cols)
        if len(df) > 0:
            betas = df["beta"].to_numpy()
            jitter = np.random.default_rng(42).uniform(-0.15, 0.15, len(betas))
            ax2.scatter(np.full_like(betas, i) + jitter, betas,
                      s=15, alpha=0.5, color="#3366cc", zorder=3)
            ax2.plot([i - 0.3, i + 0.3], [betas.mean(), betas.mean()],
                   color="#3366cc", linewidth=2, zorder=4)

ax2.axhline(2.1961, color="#cc3333", linestyle="--", linewidth=1.5,
           label="True β = 2.20")
ax2.set_xticks(range(len(freq_labels)))
ax2.set_xticklabels(freq_labels)
ax2.set_xlabel("Observations (over 14 days)")
ax2.set_ylabel("β")
ax2.legend(frameon=False, fontsize=9)
ax2.spines[["top", "right"]].set_visible(False)

plt.tight_layout()
plt.show()
```

What does a fit look like with twice the observation density? Here's
one synthetic dataset observed every 0.5 days (28 points) alongside
the 14-point daily version:

```{python}
#| code-fold: true
#| code-summary: "Plotting code"
#| label: fig-freq-fit-comparison
#| fig-cap: "Fits to the same epidemic at two observation densities. **Left:** every 1 day (14 observations), MLE β=3.30. **Right:** every 0.5 days (28 observations), MLE β=2.35 — much closer to truth (2.20). More observations resolve the parameter bias."
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(11, 4), sharey=True)

for ax, freq, ds, title in [
    (ax1, "1.0", "fit_sbc_1", "Every 1 day (14 obs)"),
    (ax2, "0.5", "fit_sbc_f0.5_d1", "Every 0.5 days (28 obs)"),
]:
    mle_path = os.path.join(find_fit_stage(ds, "refine"), "mle_params.toml")
    mle = load_mle(mle_path)

    if freq == "1.0":
        obs_data = pl.read_csv(os.path.join(FIT_DIR, "data/sbc/syn_1.tsv"), separator="\t")
        model_file = "boarding_school_sir.camdl"
    else:
        obs_data = pl.read_csv(os.path.join(FIT_DIR, f"data/sbc_freq/freq_{freq}/syn_1.tsv"), separator="\t")
        model_file = f"boarding_school_sir_obs{freq}.camdl"

    write_params(mle, os.path.join(FIT_DIR, "_tmp_mle.toml"))
    traj = run_sim(model_file, "_tmp_mle.toml", extra_args=["--replicates", "200"])

    grouped = (
        traj.select(["t", "replicate", "I"])
        .group_by("t")
        .agg([
            pl.col("I").quantile(0.025).alias("q025"),
            pl.col("I").quantile(0.25).alias("q25"),
            pl.col("I").median().alias("median"),
            pl.col("I").quantile(0.75).alias("q75"),
            pl.col("I").quantile(0.975).alias("q975"),
        ])
        .sort("t")
    )
    t = grouped["t"].to_numpy()
    ax.fill_between(t, grouped["q025"], grouped["q975"],
                    alpha=0.15, color="#cc3333")
    ax.fill_between(t, grouped["q25"], grouped["q75"],
                    alpha=0.3, color="#cc3333")
    ax.plot(t, grouped["median"], color="#cc3333", linewidth=1.5,
            label=f"Model  β={mle['beta']:.2f} γ={mle['gamma']:.2f}")
    ax.scatter(obs_data["time"], obs_data["in_bed"], color="black", s=15,
               zorder=5, alpha=0.7, label="Synthetic data")
    ax.set_xlabel("Day")
    ax.set_title(title, fontsize=11, color=GREY)
    ax.legend(frameon=False, fontsize=8)
    ax.spines[["top", "right"]].set_visible(False)

ax1.set_ylabel("Boys in bed")
os.unlink(os.path.join(FIT_DIR, "_tmp_mle.toml"))
plt.tight_layout()
plt.show()
```


## Fitting the real data

With the pipeline validated, we fit the 1978 boarding school data:

```{python}
#| echo: false
display(run_cli(
    f"cd {FIT_DIR} && camdl fit run fit_sir.toml --seed 42 "
    f"--label \"boarding school, default priors\"",
    echo="camdl fit run fit_sir.toml --seed 42 \\\n"
         "    --label \"boarding school, default priors\"",
    cwd=FIT_DIR,
    max_lines=35
))
```

`--label "<short text>"` attaches a human-readable name to the fit,
visible in `camdl fit list` and `camdl fit table` output (see
[Tracking fit iterations: labels](#tracking-fit-iterations-labels)
at the end of this chapter). It's optional but cheap — 30 seconds
at run time saves "which fit was the default-priors one?" later.

The MLE parameters:

```{python}
#| echo: false
display(run_cli(
    f"cat {find_fit_stage('fit_sir', 'refine')}/mle_params.toml",
    echo="cat output/fits/fit_sir-<hash>/real/fit_<seed>/refine/mle_params.toml",
    collapse=False
))
```

```{python}
#| echo: false
mle_sir = load_mle(os.path.join(find_fit_stage("fit_sir", "refine"), "mle_params.toml"))
R0 = mle_sir["beta"] / mle_sir["gamma"]
```

### Convergence diagnostics

```{python}
#| code-fold: true
#| code-summary: "Plotting code"
#| label: fig-sir-traces
#| fig-cap: "IF2 scout traces for the boarding school SIR. Each line is one chain. β and γ converge to a narrow range; k (overdispersion) wanders — it is weakly identified from 14 daily observations."
traces_df = load_traces(find_fit_stage("fit_sir", "scout"))

params = ["beta", "gamma", "k"]
fig, axes = plt.subplots(1, 3, figsize=(11, 3.5))
for ax, param in zip(axes, params):
    for chain in traces_df["chain"].unique().sort().to_list():
        sub = traces_df.filter(pl.col("chain") == chain)
        ax.plot(sub["iteration"], sub[param], alpha=0.3, linewidth=0.5, color="#3366cc")
    ax.set_ylabel(param)
    ax.set_xlabel("IF2 iteration")
    ax.spines[["top", "right"]].set_visible(False)
plt.tight_layout()
plt.show()
```

### The fit — posterior predictive check

```{python}
#| code-fold: true
#| code-summary: "Plotting code"
#| label: fig-sir-fit
#| fig-cap: "SIR model fit to the boarding school data. The red ribbon shows process uncertainty (200 replicates at the MLE). Black dots are the observed daily bed counts."
n_reps = 200
traj_multi = run_sim("boarding_school_sir.camdl",
                     os.path.join(find_fit_stage("fit_sir", "refine"), "mle_params.toml"),
                     extra_args=["--replicates", str(n_reps)])

grouped = (
    traj_multi.select(["t", "replicate", "I"])
    .group_by("t")
    .agg([
        pl.col("I").quantile(0.025).alias("q025"),
        pl.col("I").quantile(0.25).alias("q25"),
        pl.col("I").median().alias("median"),
        pl.col("I").quantile(0.75).alias("q75"),
        pl.col("I").quantile(0.975).alias("q975"),
    ])
    .sort("t")
)

fig, ax = plt.subplots(figsize=(7, 4))
t = grouped["t"].to_numpy()
ax.fill_between(t, grouped["q025"], grouped["q975"],
                alpha=0.15, color="#cc3333")
ax.fill_between(t, grouped["q25"], grouped["q75"],
                alpha=0.3, color="#cc3333")
ax.plot(t, grouped["median"], color="#cc3333", linewidth=1.5,
        label=f"Model  β={mle_sir['beta']:.2f}  γ={mle_sir['gamma']:.2f}  R₀={R0:.1f}")
ax.scatter(data["time"], data["in_bed"], color="black", s=30, zorder=5,
           label="Data (in bed)")
ax.set_xlabel("Day")
ax.set_ylabel("Boys in bed")
ax.legend(frameon=False, fontsize=9)
ax.spines[["top", "right"]].set_visible(False)
plt.tight_layout()
plt.show()
```


## Bayesian inference with informed priors

The MLE is biased because the likelihood on 14 points peaks in the
wrong place. A Bayesian posterior won't fix this with vague priors —
the posterior is still mostly data-driven. But for influenza, we have
genuine external information: the infectious period 1/γ is known from
clinical studies to be roughly 2–3 days. A tight prior on γ is
scientifically justified, not a modeling hack — it encodes virology,
not epidemiological modeling results.

We run PGAS+NUTS with:

- **γ ~ LogNormal(log(0.4), 0.2)** — tight, centered on 2.5-day
  infectious period. This is clinical knowledge.
- **β ~ LogNormal(0.7, 0.5)** — weak, median ≈ 2. We are *not*
  encoding R₀ knowledge.
- **k ~ LogNormal(3.0, 0.8)** — weak, median ≈ 20.

```{python}
#| echo: false
display(run_cli(
    f"cd {FIT_DIR} && camdl fit run fit_sir_pgas_informed.toml --seed 42 "
    f"--label \"PGAS posterior, log-normal R0 prior\"",
    echo="camdl fit run fit_sir_pgas_informed.toml --seed 42 \\\n"
         "    --label \"PGAS posterior, log-normal R0 prior\"",
    cwd=FIT_DIR,
    max_lines=20
))
```

```{python}
#| code-fold: true
#| code-summary: "Diagnostics and posterior"
#| label: fig-pgas-diagnostics
#| fig-cap: "PGAS posterior with informed γ prior. **Top:** trace plots — γ (center) is well-mixed and centered on truth (dashed red). β (left) explores but stays above truth. k (right) is diffuse. **Bottom:** posterior histograms. The tight γ prior pulls the infectious period to the right value; β and k posteriors are wider, reflecting genuine uncertainty."
traces_pgas = []
for ch in range(1, 5):
    df = pl.read_csv(os.path.join(find_fit_stage("fit_sir_pgas_informed", "posterior"),
                                   f"chain_{ch}/trace.tsv"),
                     separator="\t")
    df = df.with_columns(pl.lit(ch).alias("chain"))
    traces_pgas.append(df)
all_pgas = pl.concat(traces_pgas)
post_pgas = all_pgas.filter(pl.col("sweep") >= 3000)

true_vals_pgas = {"beta": 2.196, "gamma": 0.578, "k": 50.35}

fig, axes = plt.subplots(2, 3, figsize=(13, 7))

for ax, param in zip(axes[0], ["beta", "gamma", "k"]):
    for ch in range(1, 5):
        sub = all_pgas.filter(pl.col("chain") == ch)
        ax.plot(sub["sweep"], sub[param], alpha=0.5, linewidth=0.3)
    ax.axhline(true_vals_pgas[param], color="#cc3333", linestyle="--", linewidth=1.5)
    ax.axvline(3000, color=GREY, linestyle=":", alpha=0.5)
    ax.set_ylabel(param)
    ax.set_xlabel("PGAS sweep")
    ax.spines[["top", "right"]].set_visible(False)

for ax, param in zip(axes[1], ["beta", "gamma", "k"]):
    vals = post_pgas[param].to_numpy()
    ax.hist(vals, bins=40, density=True, color="#3366cc", alpha=0.6, edgecolor="none")
    ax.axvline(true_vals_pgas[param], color="#cc3333", linestyle="--", linewidth=1.5,
               label=f"Truth = {true_vals_pgas[param]:.2f}")
    ax.axvline(vals.mean(), color="#3366cc", linestyle=":", linewidth=1.5,
               label=f"Mean = {vals.mean():.2f}")
    ax.set_xlabel(param)
    ax.set_ylabel("Density")
    ax.legend(frameon=False, fontsize=7)
    ax.spines[["top", "right"]].set_visible(False)

plt.tight_layout()
plt.show()
```

```{python}
#| code-fold: true
#| code-summary: "Posterior predictive"
#| label: fig-pgas-postpred
#| fig-cap: "Posterior predictive from PGAS with informed γ prior. Blue lines are trajectories simulated at 50 random posterior draws. Dashed red is the true-parameter trajectory. Black dots are the synthetic data. The posterior predictive covers the data, but the ensemble peaks slightly earlier than the true-parameter trajectory — the residual β bias shifts the timing."
syn_data = pl.read_csv(os.path.join(FIT_DIR, "data/sbc/syn_1.tsv"), separator="\t")

fig, ax = plt.subplots(figsize=(7, 4))

np.random.seed(42)
idx = np.random.choice(len(post_pgas), 50, replace=False)
for i in idx:
    row = post_pgas.row(i, named=True)
    try:
        traj = run_sim("boarding_school_sir.camdl", "/dev/null",
                       extra_args=["--param", f"beta={row['beta']}",
                                   "--param", f"gamma={row['gamma']}",
                                   "--param", f"k={row['k']}",
                                   "--param", "N0=763", "--param", "I0=5"],
                       seed=int(np.random.randint(1, 10000)))
        ax.plot(traj["t"], traj["I"], alpha=0.08, color="#3366cc", linewidth=0.8)
    except:
        pass

traj_true = run_sim("boarding_school_sir.camdl", "/dev/null",
                    extra_args=["--param", "beta=2.196", "--param", "gamma=0.578",
                                "--param", "k=50.35", "--param", "N0=763",
                                "--param", "I0=5"],
                    seed=7)
ax.plot(traj_true["t"], traj_true["I"], color="#cc3333", linewidth=2,
        linestyle="--", label="True params", zorder=4)
ax.scatter(syn_data["time"], syn_data["in_bed"], color="black", s=30,
           zorder=5, label="Synthetic data")

ax.set_xlabel("Day")
ax.set_ylabel("Boys in bed")
ax.legend(frameon=False, fontsize=9)
ax.spines[["top", "right"]].set_visible(False)
plt.tight_layout()
plt.show()
```

The informed γ prior achieves what the MLE cannot: γ is correctly
recovered (posterior mean 0.59, truth 0.58), and k's 95% credible
interval now covers truth (7.7–70.5 includes 50.4). β remains biased
(posterior mean 3.5 vs truth 2.2) because the likelihood on this
dataset genuinely peaks higher — no prior short of "the answer is 2.2"
would fix this. But the posterior is *honest*: the wide β interval
reflects the uncertainty that the MLE point estimate hides.

### Prior vs posterior

How much did the data update our beliefs? Comparing the prior and
posterior distributions shows where the data is informative and where
the prior dominates:

```{python}
#| code-fold: true
#| code-summary: "Prior vs posterior"
#| label: fig-prior-posterior
#| fig-cap: "Prior (grey) vs posterior (blue) distributions. **γ:** the tight clinical prior (grey) barely moves — the data is consistent with the prior, and the posterior (blue) is almost identical. The prior dominates because 14 observations can't improve on clinical virology. **β:** the weak prior is substantially updated — the posterior concentrates around 3.0–4.5, far from the prior median of 2.0. The data is informative about β, just biased. **k:** the posterior shifts leftward from the prior, concentrating at 10–40, but with a long right tail."
from scipy import stats

fig, axes = plt.subplots(1, 3, figsize=(12, 3.5))

prior_specs = {
    "beta":  {"dist": stats.lognorm(s=0.5, scale=np.exp(0.7)), "label": "LN(0.7, 0.5)"},
    "gamma": {"dist": stats.lognorm(s=0.2, scale=np.exp(-0.916)), "label": "LN(-0.92, 0.2)"},
    "k":     {"dist": stats.lognorm(s=0.8, scale=np.exp(3.0)), "label": "LN(3.0, 0.8)"},
}

for ax, param in zip(axes, ["beta", "gamma", "k"]):
    post_vals = post_pgas[param].to_numpy()

    # Prior
    lo, hi = post_vals.min() * 0.5, post_vals.max() * 1.5
    if param == "k":
        hi = min(hi, 100)
    x = np.linspace(max(0.01, lo), hi, 200)
    prior_pdf = prior_specs[param]["dist"].pdf(x)
    ax.fill_between(x, prior_pdf, alpha=0.2, color=GREY, label="Prior")
    ax.plot(x, prior_pdf, color=GREY, linewidth=1)

    # Posterior
    ax.hist(post_vals, bins=40, density=True, color="#3366cc", alpha=0.5,
            edgecolor="none", label="Posterior")

    ax.axvline(true_vals_pgas[param], color="#cc3333", linestyle="--",
               linewidth=1.5, label=f"Truth = {true_vals_pgas[param]:.2f}")

    ax.set_xlabel(param)
    ax.set_ylabel("Density")
    ax.legend(frameon=False, fontsize=7)
    ax.spines[["top", "right"]].set_visible(False)

plt.tight_layout()
plt.show()
```

### Pairwise posterior correlations

The joint posterior reveals the correlation structure — which
parameters trade off against each other:

```{python}
#| code-fold: true
#| code-summary: "Pairplot"
#| label: fig-pairplot
#| fig-cap: "Pairwise posterior samples from PGAS. **β vs γ:** strong positive correlation — higher transmission paired with faster recovery produces similar epidemic curves. This is the R₀ ridge. **β vs k:** weak correlation — overdispersion is largely independent of transmission. **γ vs k:** weak negative correlation. Red crosshairs mark the true parameter values."
params_pair = ["beta", "gamma", "k"]
n_params = len(params_pair)

fig, axes = plt.subplots(n_params, n_params, figsize=(10, 10))

for i, pi in enumerate(params_pair):
    for j, pj in enumerate(params_pair):
        ax = axes[i, j]

        if i == j:
            # Diagonal: histogram
            vals = post_pgas[pi].to_numpy()
            ax.hist(vals, bins=40, density=True, color="#3366cc",
                    alpha=0.6, edgecolor="none")
            ax.axvline(true_vals_pgas[pi], color="#cc3333",
                       linestyle="--", linewidth=1.5)

        elif i > j:
            # Lower triangle: scatter
            xvals = post_pgas[pj].to_numpy()
            yvals = post_pgas[pi].to_numpy()
            # Thin for plotting
            step = max(1, len(xvals) // 500)
            ax.scatter(xvals[::step], yvals[::step], s=3, alpha=0.15,
                       color="#3366cc", edgecolors="none")
            ax.axvline(true_vals_pgas[pj], color="#cc3333",
                       linestyle="--", linewidth=0.8, alpha=0.5)
            ax.axhline(true_vals_pgas[pi], color="#cc3333",
                       linestyle="--", linewidth=0.8, alpha=0.5)

            # Correlation
            corr = np.corrcoef(xvals, yvals)[0, 1]
            ax.text(0.95, 0.95, f"r={corr:.2f}", transform=ax.transAxes,
                    ha="right", va="top", fontsize=8, color=GREY)
        else:
            # Upper triangle: empty
            ax.axis("off")

        if i == n_params - 1:
            ax.set_xlabel(pj)
        else:
            ax.set_xticklabels([])
        if j == 0:
            ax.set_ylabel(pi)
        else:
            ax.set_yticklabels([])
        ax.spines[["top", "right"]].set_visible(False)

plt.tight_layout()
plt.show()
```

::: {.callout-note collapse="false"}
## When is an informative prior justified?

The γ prior used here encodes clinical virology, not epidemiological
modeling results. The infectious period of influenza is measured
independently of any transmission model — it comes from cohort studies
tracking viral shedding and symptom duration. This is external
information that legitimately constrains the model.

A prior on β (the transmission rate) would be different — it encodes
knowledge from *other* modeling studies, which creates circularity.
A prior on R₀ is worse: it directly encodes the answer we're trying
to estimate.

The rule: priors on biological rates (infectious period, latent period,
incubation time) from clinical data are scientifically defensible.
Priors on epidemiological quantities (R₀, attack rate, reporting
fraction) from previous modeling are ethically murkier — use them
only when you can cite the independent data source.
:::


## Bootstrap bias correction

The MLE is biased, but is it *correctable*? The parametric bootstrap
provides a simple recipe: estimate the bias by re-fitting synthetic
data generated at the MLE, then subtract it.

For each of our 30 synthetic datasets: (1) take the MLE $\hat\theta$,
(2) simulate $B = 20$ new datasets from $\hat\theta$, (3) fit each to
get bootstrap MLEs $\{\hat\theta^*_1, \ldots, \hat\theta^*_B\}$, (4)
estimate the bias as $\hat{b} = \overline{\hat\theta^*} - \hat\theta$,
(5) correct: $\hat\theta_{BC} = 2\hat\theta - \overline{\hat\theta^*}$.

The trade-off: bootstrap correction removes $O(1/K)$ bias but inflates
variance by roughly $(1 + 1/B)$. Whether MSE improves depends on
whether bias or variance dominates.

```{python}
#| code-fold: true
#| code-summary: "Bootstrap results"
#| label: fig-bootstrap
#| fig-cap: "Bootstrap bias correction across three observation densities. **Left:** MLE (blue) vs bias-corrected (orange) estimates for β, with truth marked by dashed red line. **Right:** MSE comparison — the bias correction reduces MSE at K=14 where bias dominates, but may increase it at K=56 where variance dominates."
boot_results = {}
for freq in ["1.0", "0.5", "0.25"]:
    path = os.path.join(FIT_DIR, f"results/bootstrap/freq_{freq}_B20.tsv")
    if os.path.exists(path):
        df = pl.read_csv(path, separator="\t")
        if len(df) > 0:
            boot_results[freq] = df

if boot_results:
    true_vals_boot = {"beta": 2.1961, "gamma": 0.5782, "k": 50.35}
    freq_labels_boot = {"1.0": "14 obs", "0.5": "28 obs", "0.25": "56 obs"}

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

    # Left: β estimates (MLE vs BC) by frequency
    for i, (freq, label) in enumerate(freq_labels_boot.items()):
        if freq not in boot_results:
            continue
        df = boot_results[freq]
        mle_vals = df["mle_beta"].to_numpy()
        bc_vals = df["bc_beta"].to_numpy()
        jitter = 0.12
        ax1.scatter(np.full_like(mle_vals, i) - jitter,
                   mle_vals, s=12, alpha=0.5, color="#3366cc", label="MLE" if i == 0 else None)
        ax1.scatter(np.full_like(bc_vals, i) + jitter,
                   bc_vals, s=12, alpha=0.5, color="#e67e22", label="BC" if i == 0 else None)
        ax1.plot([i - 0.3, i - 0.05], [mle_vals.mean(), mle_vals.mean()],
                color="#3366cc", linewidth=2)
        ax1.plot([i + 0.05, i + 0.3], [bc_vals.mean(), bc_vals.mean()],
                color="#e67e22", linewidth=2)

    ax1.axhline(true_vals_boot["beta"], color="#cc3333", linestyle="--", linewidth=1.5,
               label="Truth")
    ax1.set_xticks(range(len(freq_labels_boot)))
    ax1.set_xticklabels(freq_labels_boot.values())
    ax1.set_ylabel("β")
    ax1.set_xlabel("Observation density")
    ax1.legend(frameon=False, fontsize=9)
    ax1.spines[["top", "right"]].set_visible(False)

    # Right: MSE comparison
    params_mse = ["beta", "gamma", "k"]
    x_pos = np.arange(len(freq_labels_boot))
    width = 0.35

    for param_idx, param in enumerate(["beta"]):
        mle_mses = []
        bc_mses = []
        labels_used = []
        for freq, label in freq_labels_boot.items():
            if freq not in boot_results:
                continue
            df = boot_results[freq]
            mle_v = df[f"mle_{param}"].to_numpy()
            bc_v = df[f"bc_{param}"].to_numpy()
            truth = true_vals_boot[param]
            mle_mses.append(np.mean((mle_v - truth)**2))
            bc_mses.append(np.mean((bc_v - truth)**2))
            labels_used.append(label)

        x = np.arange(len(labels_used))
        ax2.bar(x - width/2, mle_mses, width, label="MLE", color="#3366cc", alpha=0.7)
        ax2.bar(x + width/2, bc_mses, width, label="BC", color="#e67e22", alpha=0.7)
        ax2.set_xticks(x)
        ax2.set_xticklabels(labels_used)
        ax2.set_ylabel(f"MSE (β)")
        ax2.set_xlabel("Observation density")
        ax2.legend(frameon=False, fontsize=9)
        ax2.spines[["top", "right"]].set_visible(False)

    plt.tight_layout()
    plt.show()
else:
    print("Bootstrap results not yet available — run scripts/bootstrap_sbc.sh")
```

```{python}
#| code-fold: true
#| code-summary: "Posterior predictive: MLE vs bias-corrected"
#| label: fig-bootstrap-fit
#| fig-cap: "Predicted trajectories from the MLE (blue) and bias-corrected (orange) estimates for one synthetic dataset at K=14. The bias-corrected trajectory peaks later and lower, closer to the true-parameter trajectory (dashed red)."
if "1.0" in boot_results:
    df_boot = boot_results["1.0"]
    row = df_boot.row(0, named=True)  # first dataset

    syn_data_boot = pl.read_csv(os.path.join(FIT_DIR, "data/sbc/syn_1.tsv"), separator="\t")

    fig, ax = plt.subplots(figsize=(7, 4))

    for params, color, label in [
        ({"beta": row["mle_beta"], "gamma": row["mle_gamma"], "k": row["mle_k"],
          "N0": 763, "I0": 5}, "#3366cc", f"MLE (β={row['mle_beta']:.2f})"),
        ({"beta": max(0.5, row["bc_beta"]), "gamma": max(0.1, row["bc_gamma"]),
          "k": max(1.0, row["bc_k"]), "N0": 763, "I0": 5},
         "#e67e22", f"BC (β={row['bc_beta']:.2f})"),
        ({"beta": 2.1961, "gamma": 0.5782, "k": 50.35, "N0": 763, "I0": 5},
         "#cc3333", "Truth (β=2.20)"),
    ]:
        write_params(params, os.path.join(FIT_DIR, "_tmp_boot.toml"))
        traj = run_sim("boarding_school_sir.camdl", "_tmp_boot.toml",
                       extra_args=["--replicates", "100"])
        grouped = (
            traj.select(["t", "replicate", "I"])
            .group_by("t")
            .agg(pl.col("I").median().alias("median"))
            .sort("t")
        )
        ls = "--" if "Truth" in label else "-"
        lw = 2 if "Truth" in label else 1.5
        ax.plot(grouped["t"], grouped["median"], color=color, linewidth=lw,
                linestyle=ls, label=label)

    ax.scatter(syn_data_boot["time"], syn_data_boot["in_bed"], color="black",
               s=30, zorder=5, label="Data")
    ax.set_xlabel("Day")
    ax.set_ylabel("Boys in bed")
    ax.legend(frameon=False, fontsize=8)
    ax.spines[["top", "right"]].set_visible(False)
    os.unlink(os.path.join(FIT_DIR, "_tmp_boot.toml"))
    plt.tight_layout()
    plt.show()
```


## Observation model comparison: NegBin vs Poisson vs Binomial

The overdispersion parameter k couples to β and is hard to estimate
from 14 points. What if we remove it entirely? Two simpler alternatives:

```camdl
likelihood = poisson(rate = projected)              # variance = mean
likelihood = binomial(n = N0, p = projected / N0)   # variance = np(1-p)
```

Both have only two free parameters (β, γ). We fit all three to the
real data and compare:

```{python}
#| code-fold: true
#| code-summary: "Plotting code"
#| label: fig-obs-model-comparison
#| fig-cap: "Three observation models on the boarding school data. **Left:** NegBin (3 params). **Center:** Poisson (2 params). **Right:** Binomial (2 params). The Poisson and Binomial fits are tighter (narrower ribbons) because their variance is determined by the mean — no free k parameter."
mle_negbin  = load_mle(os.path.join(find_fit_stage("fit_sir",          "refine"), "mle_params.toml"))
mle_poisson = load_mle(os.path.join(find_fit_stage("fit_sir_poisson",  "refine"), "mle_params.toml"))
mle_binom   = load_mle(os.path.join(find_fit_stage("fit_sir_binomial", "refine"), "mle_params.toml"))

fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(14, 4), sharey=True)

configs = [
    (ax1, "boarding_school_sir.camdl", mle_negbin,
     f"NegBin  β={mle_negbin['beta']:.2f} γ={mle_negbin['gamma']:.2f} k={mle_negbin.get('k',0):.0f}"),
    (ax2, "boarding_school_sir_poisson.camdl", mle_poisson,
     f"Poisson  β={mle_poisson['beta']:.2f} γ={mle_poisson['gamma']:.2f}"),
    (ax3, "boarding_school_sir_binomial.camdl", mle_binom,
     f"Binomial  β={mle_binom['beta']:.2f} γ={mle_binom['gamma']:.2f}"),
]

for ax, model, mle, title in configs:
    write_params(mle, os.path.join(FIT_DIR, "_tmp_mle.toml"))
    traj = run_sim(model, "_tmp_mle.toml", extra_args=["--replicates", "200"])

    grouped = (
        traj.select(["t", "replicate", "I"])
        .group_by("t")
        .agg([
            pl.col("I").quantile(0.025).alias("q025"),
            pl.col("I").quantile(0.25).alias("q25"),
            pl.col("I").median().alias("median"),
            pl.col("I").quantile(0.75).alias("q75"),
            pl.col("I").quantile(0.975).alias("q975"),
        ])
        .sort("t")
    )
    t = grouped["t"].to_numpy()
    ax.fill_between(t, grouped["q025"], grouped["q975"],
                    alpha=0.15, color="#cc3333")
    ax.fill_between(t, grouped["q25"], grouped["q75"],
                    alpha=0.3, color="#cc3333")
    ax.plot(t, grouped["median"], color="#cc3333", linewidth=1.5)
    ax.scatter(data["time"], data["in_bed"], color="black", s=30, zorder=5)
    ax.set_xlabel("Day")
    ax.set_title(title, fontsize=10, color=GREY)
    ax.spines[["top", "right"]].set_visible(False)

ax1.set_ylabel("Boys in bed")
os.unlink(os.path.join(FIT_DIR, "_tmp_mle.toml"))
plt.tight_layout()
plt.show()
```

```{python}
#| code-fold: true
#| code-summary: "SBC comparison"
#| label: fig-obs-model-sbc
#| fig-cap: "Parameter recovery across three observation models (20 synthetic datasets each, K=14). **Left:** β estimates. Poisson and Binomial recover β well (+3–7% bias); NegBin is biased +59%. **Right:** γ estimates. The pattern reverses — NegBin recovers γ better (+9%) while Poisson and Binomial overshoot (+41%). No observation model gives unbiased estimates of all parameters from 14 daily observations."
sbc_negbin = pl.read_csv(os.path.join(FIT_DIR, "results/sbc/summary.tsv"),
                         separator="\t", skip_rows=1, has_header=False,
                         new_columns=["dseed", "beta", "gamma", "k", "ll"])

sbc_binom = pl.read_csv(os.path.join(FIT_DIR, "results/sbc_binomial.tsv"),
                        separator="\t", skip_rows=1, has_header=False,
                        new_columns=["dseed", "beta", "gamma", "ll"])

sbc_poisson = pl.read_csv(os.path.join(FIT_DIR, "results/sbc_poisson.tsv"),
                          separator="\t", skip_rows=1, has_header=False,
                          new_columns=["dseed", "beta", "gamma", "ll"])

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

models = [
    ("NegBin", sbc_negbin, "#3366cc"),
    ("Poisson", sbc_poisson, "#2e8b2e"),
    ("Binomial", sbc_binom, "#e67e22"),
]

for ax, param, true_val in [(ax1, "beta", 2.1961), (ax2, "gamma", 0.5782)]:
    for i, (name, sbc_df, color) in enumerate(models):
        vals = sbc_df[param].to_numpy()[:20]
        jitter = np.random.default_rng(42 + i).uniform(-0.12, 0.12, len(vals))
        ax.scatter(np.full(len(vals), i) + jitter, vals,
                  s=15, alpha=0.5, color=color, label=name)
        ax.plot([i - 0.3, i + 0.3], [vals.mean(), vals.mean()],
               color=color, linewidth=2)

    ax.axhline(true_val, color="#cc3333", linestyle="--", linewidth=1.5,
               label=f"Truth = {true_val:.3f}")
    ax.set_xticks(range(len(models)))
    ax.set_xticklabels([m[0] for m in models])
    ax.set_ylabel(param)
    ax.legend(frameon=False, fontsize=7)
    ax.spines[["top", "right"]].set_visible(False)

plt.tight_layout()
plt.show()
```

The observation model choice changes *which* parameters are well-identified.
The NegBin model with free k absorbs noise into overdispersion, which
lets β drift high but gives γ room to converge. Poisson and Binomial
(both with fixed variance) pin β closer to truth but force γ to
compensate. The tradeoff is fundamental: with 14 observations, you
can't precisely estimate both the rate and the shape of the epidemic
curve simultaneously.


## Tracking fit iterations: labels

Real model-building iterates. You'll widen bounds after seeing scout
chains drift, free a parameter that the data turns out to identify
after all, swap a uniform prior for a log-normal once you accept it
isn't truly uninformative (see the chapter on priors). Each
variation produces a **separate fit directory**, content-addressed
by the model IR + fit.toml + data hashes:

```
results/fits/
  fit_sir-04ab12cd/    # default priors
  fit_sir-1f3c45ee/    # uniform priors
  fit_sir-2a8b7901/    # log-normal R0 prior
  ...
```

The 8-character hash makes each variation unique on disk, but
hashes are unreadable. **Attach a short label at fit time so future-
you can tell the variations apart at a glance.** Pass `--label
"<short description>"`:

```bash
$ camdl fit run fit_sir.toml --seed 42 \
      --label "default priors, scout-only"

$ camdl fit run fit_sir_uniform.toml --seed 42 \
      --label "uniform priors on beta, gamma"

$ camdl fit run fit_sir_pgas_informed.toml --seed 42 \
      --label "log-normal R0 prior, PGAS posterior"
```

Labels are 1–64 characters: letters, digits, spaces, commas, dot,
underscore, hyphen. Two fits may share a label — it's an
annotation, not a key. The label is persisted in
`<fit_dir>/run.json` under `kind.label`.

`camdl fit table results/fits/` surfaces the label as the second
column, and `<unlabelled>` (dimmed) for any fit that didn't get
one:

```
fit_id    label                              stem      method  converged   best_ll    R0    age
04ab12cd  default priors, scout-only         fit_sir   if2     ✓          -19.2      1.51   3d
1f3c45ee  uniform priors on beta, gamma      fit_sir   if2     ✓          -19.4      1.49   3d
2a8b7901  log-normal R0 prior, PGAS posterior fit_sir  pgas    R̂ ok       —          1.50   1d
```

If you forget to label a fit at run time, `camdl fit label <hash>
"<text>"` adds or updates the label post-hoc:

```bash
$ camdl fit label 04ab12cd "default priors, scout-only (rerun)"
```

Pass any 8+-character hash prefix; an ambiguous prefix errors with
a hint. Labels can also be filtered with glob patterns:
`fit table --label-pattern "default*"` shows only fits whose label
starts with "default". When the unlabelled-fits count exceeds a
threshold (default 5; configure via the `CAMDL_UNLABELED_THRESHOLD`
env var), `fit list` and `fit table` print a stderr nudge with the
label-the-rest-of-them commands. Set `CAMDL_UNLABELED_THRESHOLD=0`
to disable the nudge.

The pattern that pays off: label every fit, even the ones you think
are throwaway. The 30 seconds at run time saves an afternoon of
"which fit was the one with the tightened gamma bounds?" later.
See [`docs/dev/proposals/2026-04-28-fit-experiment-management.md`](https://github.com/vsbuffalo/camdl/blob/main/docs/dev/proposals/2026-04-28-fit-experiment-management.md) for the full design rationale.

## What's next

- Profile likelihoods for identifiability analysis
- Model comparison: SIBCR (bed count = post-infectious) and SEIR (latent period)
- Out-of-sample validation
- The [He et al. measles vignette](../vignettes/he2010.qmd) applies the
  same pipeline to a far more complex model
