A Cleaner Way to Categorize Probabilistic Models

Posted Nov 19, 2025

By Cheng Guo

4 min read

The classic split between generative and discriminative models was introduced for supervised learning: compare Naïve Bayes (model $p(x, y)$) with Logistic Regression (model $p(y\mid x)$).

In contemporary ML, the term “generative” is used loosely to refer to any model that is not discriminative, regardless of whether it actually specifies a probability distribution, a latent generative story, or even a sampling mechanism. The result is a terminology that is historically overloaded, pedagogically misleading, and conceptually inaccurate. The field needs a cleaner taxonomy—one that matches what probabilistic models actually do, rather than relying on outdated terminology. Here we propose such a framework.

1. Primary axis: Distribution modeling vs. Conditional modeling

The first, clean question is:

What probability does the model aim to capture?

We propose to replace “generative vs. discriminative” with Distribution modeling and Conditional modeling.

Distribution Modeling

The model’s primary target is the data distribution $p(x)$. Examples include:

latent-variable generative models (GMM, HMM, ICA, VAE, RBM, diffusion-as-generative-chain)
normalizing flows
autoregressive models
EBMs
score models
efficient-coding / equal-mass partition models
Bayesian networks modeling only $p(x)$
Markov chains defining a stationary distribution

Conditional Modeling

The model’s target is a conditional distribution $p(y\mid x)$. Examples:

logistic regression
CRFs / HCRFs
Bayesian logistic regression

This axis is minimal, clean, and matches probability theory:

“Distribution” = model $p(x)$
“Conditional” = model $p(y\mid x)$

No extra philosophical baggage.

2. Second axis: Latent vs. non-latent

Next, we ask:

Does the model explicitly introduce latent variables $z$ as part of its probabilistic story?

Latent = Yes The model introduces hidden variables (sources, codes, states, classes, etc.) with a distribution $p(z)$ and some relationship to $x$.
Latent = No The model operates directly on $x$: it may factorize $p(x)$, define a flow, an energy, a score, a Markov chain over $x$, or an equal-mass partition—without any explicit latent $z$.

For distribution models:

Latent + distribution ⇒ latent-variable generative models in the classical statistical sense: they specify a joint $p(x,z) = p(z)p(x\mid z)$ and a sampling story.
Non-latent + distribution ⇒ non-latent distributional models: they define or approximate aspects of $p(x)$ without introducing explicit latents.

For conditional models:

Latent or non-latent just tells you whether you use hidden variables in $p(y\mid x, z)$ etc.

3. Third axis: Density form

For distribution models, how is $p(x)$ represented?

Exact density $\log p_\theta(x)$ is tractable. e.g., flows, autoregressive models, many GMMs, some Bayesian networks.
Approximate density Optimized via a bound (e.g. ELBO). e.g., VAEs, DDPM-style diffusion models.
Unnormalized density Only energy $E(x)$; $p(x) \propto e^{-E(x)}$ with unknown partition function. e.g., EBMs, RBMs.
Implicit density Defined only via a sampler or transformation; no closed-form $p(x)$. e.g., GANs, some Markov chains, pure score models (where you know only $\nabla\log p)$.

4. Fourth axis: Sampling route

For distribution models, how do we sample from the learned distribution?

Direct / ancestral One pass through a generative story or invertible map. e.g., GMM, naive Bayes, ICA, sparse coding, Bayesian networks, flows.
Iterative / Markovian Sampling requires a chain: MCMC, Gibbs, Langevin, reverse diffusion, autoregressive token-by-token. e.g., RBM, EBMs, score-based SDEs, DDPM, Markov chains, autoregressive models.
No sampling defined The model’s goal is to capture structure in $p(x)$ without providing a generative sampler.

5. A unified table of probabilistic models

Here’s the updated table using your preferences:

Axis 1: Target (Distribution vs. Conditional)
Axis 2: Latent? (Yes / No)
Axis 3: Density type
Axis 4: Sampling
Axis 5: Inference / training

Table: Categorizing Probabilistic Models

Model	Target (Dist./Cond.)	Latent?	Density type	Sampling	Inference / Training
GMM	Dist.	Yes	Exact	Direct (ancestral)	EM / MLE
Naive Bayes	Cond. (or Dist.)	Yes (class as latent or observed)	Exact	Direct	MLE
Bayesian network	Dist. or Cond.	Yes (if hidden nodes)	Exact (if structured)	Direct	MLE / EM / VI
Markov chain (stationary dist.)	Dist.	No	Implicit	Iterative (chain)	Transition estimation / MLE
HMM / SSM	Dist.	Yes	Exact (marginalizable)	Direct (ancestral)	EM / VI
RBM	Dist.	Yes	Unnormalized joint	Iterative (Gibbs / CD)	Contrastive divergence / ML
VAE	Dist.	Yes	Approx. (ELBO)	Direct (ancestral from prior)	Variational inference
Diffusion models (DDPM-style)	Dist.	Yes (multi-step noise latents)	Approx. (ELBO-like)	Iterative (reverse diffusion)	Variational / score-style
Score-based SDE models	Dist.	No	Implicit (score only)	Iterative (reverse SDE/ODE)	Score matching
Normalizing flows	Dist.	No	Exact	Direct (invertible map)	MLE
Autoregressive models	Dist.	No	Exact	Iterative (sequential)	MLE
EBM (visible only)	Dist.	No (or latents if extended)	Unnormalized	Iterative (MCMC) or none	Contrastive / ML
GAN	Dist.	Yes (noise input)	Implicit	Direct (generator forward)	Adversarial (min–max)
ICA	Dist.	Yes	Exact (often)	Direct (sample sources, mix)	MLE / InfoMax
Sparse coding	Dist.	Yes	Approx. (often MAP-based)	Direct (sample sparse codes)	Recon + sparsity, sometimes VI
PPCA / Factor analysis	Dist.	Yes	Exact	Direct	EM
Even code IPU model	Dist.	No	Implicit (equal-mass partition of $p(x)$)	None	Information-theoretic
Logistic regression	Cond.	No	Exact $p(y\mid x)$	N/A	CE / MLE
Bayesian logistic regression	Cond.	Yes (parameter posterior)	Posterior over params	N/A	VI / MCMC
CRF / HCRF	Cond.	Yes (structured latents)	Exact / structured cond. likelihood	N/A	Conditional likelihood / max-margin

Probabilistic Modeling

This post is licensed under CC BY 4.0 by the author.