Specialized Likelihoods
Table of Contents
- Zero-Inflated Models
- Hurdle Models
- Censored Data
- Truncated Distributions
- Ordinal Regression
- Robust Regression
Zero-Inflated Models
When to Use
Use zero-inflated models when data has more zeros than the base distribution predicts. Common in: - Count data with structural zeros (species never present at a site) - Medical data (many healthy patients with zero symptoms) - Insurance claims (many policies with no claims)
Zero-Inflated Poisson
import pymc as pm
with pm.Model() as zip_model:
# Zero-inflation probability
psi = pm.Beta("psi", alpha=2, beta=2) # P(structural zero)
# Poisson rate for non-zero process
mu = pm.Exponential("mu", lam=1)
# Zero-Inflated Poisson likelihood
y = pm.ZeroInflatedPoisson("y", psi=psi, mu=mu, observed=y_obs)Interpretation: - psi: Probability of a structural zero (zero from a separate process) - mu: Mean of the Poisson process for non-structural observations - Observed zeros come from both sources: structural (with prob psi) and sampling zeros from Poisson
Zero-Inflated Poisson Regression
with pm.Model() as zip_regression:
# Regression on zero-inflation (logit link)
alpha_psi = pm.Normal("alpha_psi", 0, 2)
beta_psi = pm.Normal("beta_psi", 0, 1, dims="features")
logit_psi = alpha_psi + pm.math.dot(X, beta_psi)
psi = pm.math.sigmoid(logit_psi)
# Regression on Poisson rate (log link)
alpha_mu = pm.Normal("alpha_mu", 0, 2)
beta_mu = pm.Normal("beta_mu", 0, 1, dims="features")
log_mu = alpha_mu + pm.math.dot(X, beta_mu)
mu = pm.math.exp(log_mu)
# Zero-Inflated Poisson
y = pm.ZeroInflatedPoisson("y", psi=psi, mu=mu, observed=y_obs)Zero-Inflated Negative Binomial
When data is both zero-inflated AND overdispersed:
with pm.Model() as zinb_model:
psi = pm.Beta("psi", alpha=2, beta=2)
mu = pm.Exponential("mu", lam=0.5)
alpha = pm.Exponential("alpha", lam=1) # overdispersion
y = pm.ZeroInflatedNegativeBinomial(
"y", psi=psi, mu=mu, alpha=alpha, observed=y_obs
)Zero-Inflated Binomial
For bounded count data with excess zeros:
with pm.Model() as zib_model:
psi = pm.Beta("psi", alpha=2, beta=2)
p = pm.Beta("p", alpha=2, beta=2) # success probability
y = pm.ZeroInflatedBinomial("y", psi=psi, n=n_trials, p=p, observed=y_obs)Hurdle Models
Hurdle vs Zero-Inflated
| Aspect | Zero-Inflated | Hurdle |
|---|---|---|
| Conceptual | Two sources of zeros | One process for zero/nonzero, another for positive counts |
| Zeros | From both processes | Only from the “hurdle” process |
| Positive values | From count process (which can also produce zeros) | From truncated count process |
Use hurdle when: - Zero represents “no event occurred” (binary decision) - Positive counts represent “how many given event occurred”
Use zero-inflated when: - Some zeros are “structural” (impossible for event to occur) - Other zeros are sampling zeros from a count process
Hurdle Poisson
import pytensor.tensor as pt
with pm.Model() as hurdle_poisson:
# Probability of crossing the hurdle (having any count)
theta = pm.Beta("theta", alpha=2, beta=2)
# Poisson rate for positive counts
mu = pm.Exponential("mu", lam=1)
# Custom likelihood for hurdle model
def hurdle_logp(y, theta, mu):
# P(y=0) = 1 - theta
# P(y=k | k>0) = theta * Poisson(k|mu) / (1 - Poisson(0|mu))
zero_logp = pt.log(1 - theta)
pos_logp = (
pt.log(theta)
+ pm.logp(pm.Poisson.dist(mu=mu), y)
- pt.log(1 - pt.exp(-mu)) # truncation adjustment
)
return pt.where(pt.eq(y, 0), zero_logp, pos_logp)
y = pm.CustomDist("y", theta, mu, logp=hurdle_logp, observed=y_obs)Hurdle Negative Binomial
with pm.Model() as hurdle_negbin:
theta = pm.Beta("theta", alpha=2, beta=2)
mu = pm.Exponential("mu", lam=0.5)
alpha = pm.Exponential("alpha", lam=1)
def hurdle_nb_logp(y, theta, mu, alpha):
nb_dist = pm.NegativeBinomial.dist(mu=mu, alpha=alpha)
p_zero_nb = pt.exp(pm.logp(nb_dist, 0))
zero_logp = pt.log(1 - theta)
pos_logp = (
pt.log(theta)
+ pm.logp(nb_dist, y)
- pt.log(1 - p_zero_nb)
)
return pt.where(pt.eq(y, 0), zero_logp, pos_logp)
y = pm.CustomDist("y", theta, mu, alpha, logp=hurdle_nb_logp, observed=y_obs)Censored Data
Types of Censoring
- Right censoring: Value known to be above some threshold (e.g., survival time > study end)
- Left censoring: Value known to be below some threshold (e.g., concentration < detection limit)
- Interval censoring: Value known to be within an interval
pm.Censored
PyMC’s pm.Censored wraps any distribution to handle censoring:
with pm.Model() as censored_model:
mu = pm.Normal("mu", mu=0, sigma=10)
sigma = pm.HalfNormal("sigma", sigma=5)
# Right-censored at upper_bound
y = pm.Censored(
"y",
dist=pm.Normal.dist(mu=mu, sigma=sigma),
lower=None, # no left censoring
upper=upper_bound, # right censoring threshold
observed=y_obs,
)Right-Censored Data
Common in survival analysis where follow-up ends before events occur:
# y_obs: observed times (events) or censoring times
# censored: boolean indicator (True if censored)
with pm.Model() as survival_model:
mu = pm.Normal("mu", mu=3, sigma=2)
sigma = pm.HalfNormal("sigma", sigma=1)
# Upper bound is observed time for censored observations, None otherwise
upper = np.where(censored, y_obs, np.inf)
y = pm.Censored(
"y",
dist=pm.LogNormal.dist(mu=mu, sigma=sigma),
lower=None,
upper=upper,
observed=y_obs,
)Left-Censored Data (Detection Limits)
When measurements below a threshold are recorded as the threshold:
# y_obs: observed values (detection_limit for censored observations)
# below_limit: boolean indicator
with pm.Model() as detection_limit_model:
mu = pm.Normal("mu", mu=0, sigma=5)
sigma = pm.HalfNormal("sigma", sigma=2)
lower = np.where(below_limit, -np.inf, y_obs) # actual lower for censored
upper = np.where(below_limit, detection_limit, y_obs)
# For left-censored: value is somewhere in (-inf, detection_limit]
y = pm.Censored(
"y",
dist=pm.Normal.dist(mu=mu, sigma=sigma),
lower=lower,
upper=upper,
observed=y_obs,
)Tobit Regression (Censored Regression)
with pm.Model() as tobit:
alpha = pm.Normal("alpha", 0, 10)
beta = pm.Normal("beta", 0, 2, dims="features")
sigma = pm.HalfNormal("sigma", sigma=2)
mu = alpha + pm.math.dot(X, beta)
# Left-censored at 0 (common for expenditure, hours worked)
y = pm.Censored(
"y",
dist=pm.Normal.dist(mu=mu, sigma=sigma),
lower=0,
upper=None,
observed=y_obs,
)Truncated Distributions
pm.Truncated
Unlike censoring (where we know value exceeds a bound), truncation means data outside bounds is never observed. Use pm.Truncated:
with pm.Model() as truncated_model:
mu = pm.Normal("mu", mu=0, sigma=10)
sigma = pm.HalfNormal("sigma", sigma=5)
# Only observe values in [lower, upper]
y = pm.Truncated(
"y",
dist=pm.Normal.dist(mu=mu, sigma=sigma),
lower=0, # no negative values observed
upper=100, # no values above 100 observed
observed=y_obs,
)Censored vs Truncated
| Aspect | Censored | Truncated |
|---|---|---|
| Data | Censored values recorded at bound | Values outside bounds not in dataset |
| Sample size | Fixed, includes censored obs | Varies, excludes out-of-bound obs |
| Example | Survival time > study end recorded as “censored at T” | Only customers who bought are in purchase dataset |
Example: Truncated at Zero
# Data: Only positive observations (e.g., income for employed)
with pm.Model() as positive_only:
mu = pm.Normal("mu", mu=10, sigma=5)
sigma = pm.HalfNormal("sigma", sigma=3)
y = pm.Truncated(
"y",
dist=pm.Normal.dist(mu=mu, sigma=sigma),
lower=0,
upper=None,
observed=y_obs,
)Ordinal Regression
When to Use
For ordered categorical outcomes: survey responses (1-5 stars), education levels, disease severity grades.
pm.OrderedLogistic
with pm.Model() as ordinal:
# Regression coefficients
beta = pm.Normal("beta", mu=0, sigma=2, dims="features")
# Cutpoints (thresholds between categories)
# K-1 cutpoints for K categories
cutpoints = pm.Normal(
"cutpoints",
mu=np.linspace(-2, 2, n_categories - 1),
sigma=1,
transform=pm.distributions.transforms.ordered,
shape=n_categories - 1,
)
# Linear predictor
eta = pm.math.dot(X, beta)
# Ordered logistic likelihood
y = pm.OrderedLogistic("y", eta=eta, cutpoints=cutpoints, observed=y_obs)Interpreting Cutpoints
Cutpoints define boundaries between adjacent categories on the latent scale: - P(Y <= k) = logistic(cutpoints[k] - eta) - More positive eta → higher probability of higher categories
# Posterior predictive probabilities for each category
with model:
pm.sample_posterior_predictive(idata, extend_inferencedata=True)
# Examine predicted category probabilities
pred_probs = idata.posterior_predictive["y"]Priors for Cutpoints
The ordering constraint is critical:
# Option 1: Ordered transform (recommended)
cutpoints = pm.Normal(
"cutpoints", mu=0, sigma=2,
transform=pm.distributions.transforms.ordered,
shape=n_categories - 1,
)
# Option 2: Induced from differences
diffs = pm.HalfNormal("diffs", sigma=1, shape=n_categories - 1)
cutpoints = pm.Deterministic("cutpoints", pt.cumsum(diffs) - diffs.sum() / 2)Ordered Probit Alternative
with pm.Model() as ordered_probit:
beta = pm.Normal("beta", mu=0, sigma=2, dims="features")
cutpoints = pm.Normal(
"cutpoints", mu=0, sigma=2,
transform=pm.distributions.transforms.ordered,
shape=n_categories - 1,
)
eta = pm.math.dot(X, beta)
# Probit link via cumulative normal
y = pm.OrderedProbit("y", eta=eta, cutpoints=cutpoints, observed=y_obs)Robust Regression
Student-t Likelihood
Replace Normal with Student-t for outlier-robust regression:
with pm.Model() as robust_regression:
alpha = pm.Normal("alpha", mu=0, sigma=10)
beta = pm.Normal("beta", mu=0, sigma=2, dims="features")
sigma = pm.HalfNormal("sigma", sigma=2)
# Degrees of freedom: lower = heavier tails = more robust
# nu=1 is Cauchy (very heavy), nu>30 ≈ Normal
nu = pm.Exponential("nu", lam=1/30) + 1 # ensures nu > 1
mu = alpha + pm.math.dot(X, beta)
# Student-t downweights outliers
y = pm.StudentT("y", nu=nu, mu=mu, sigma=sigma, observed=y_obs)Prior on Degrees of Freedom
The nu parameter controls robustness:
# Weakly informative, allows both robust and normal-like behavior
nu = pm.Gamma("nu", alpha=2, beta=0.1) # mode around 10
# Prior expecting heavy tails (more robust)
nu = pm.Exponential("nu", lam=1/10) + 1 # concentrated at small values
# Fixed (if you have strong prior knowledge)
nu = 4 # moderately heavy tailsComparing to Normal Likelihood
# Fit both models
with normal_model:
idata_normal = pm.sample()
with robust_model:
idata_robust = pm.sample()
# Compare via LOO
comparison = az.compare({
"normal": idata_normal,
"robust": idata_robust,
}, ic="loo")Quantile Regression
For median or other quantile regression using asymmetric Laplace:
import pytensor.tensor as pt
with pm.Model() as quantile_regression:
alpha = pm.Normal("alpha", mu=0, sigma=10)
beta = pm.Normal("beta", mu=0, sigma=2, dims="features")
sigma = pm.HalfNormal("sigma", sigma=2)
mu = alpha + pm.math.dot(X, beta)
# Quantile (0.5 = median, 0.9 = 90th percentile)
tau = 0.5
# Asymmetric Laplace log-probability
def asymmetric_laplace_logp(y, mu, sigma, tau):
z = (y - mu) / sigma
return pt.log(tau * (1 - tau) / sigma) - z * (tau - (z < 0))
y = pm.CustomDist(
"y", mu, sigma, tau,
logp=asymmetric_laplace_logp,
observed=y_obs,
)See Also
- priors.md - Prior selection guidance
- diagnostics.md - Convergence diagnostics
- mixtures.md - Mixture models (related to zero-inflated)
- troubleshooting.md - Common pitfalls