Mehrdad Mohammadi

Information Theory Applications Glimpse in Statistics

Era of massive data sets brings fascinating problems at the interfaces between information theory and (statistical) learning theory.


There are various studied connections between information theory and statistics:


In their classic paper, Kolmogorov and Tikhomirov(1959) make connections between statistical estimation, metric entropy and the notion of channel capacity. Let’s write and draw this in information theoretic jargon. Let;

Codebook: indexed parametric family of probability distributions \(\{Q_\theta | \theta \in \Theta\}\)
Codeword: nature chooses some parameter \(\theta^\ast \in \Theta\) (the so-called True Parameter value)
Channel: user observes \(n\) i.i.d. draws from the true distribution \(X_i \sim Q_\theta^\ast\)
Decoding: estimator \(X^n \mapsto \hat{\theta}\) such that \(\hat{\theta}\overset{\mathbb{P}}{\rightarrow}\theta^\ast\)

The setting could be defined in many variations:

codebooks/codewords:graphs, vectors, matrices, functions, densities,etc.
channels:random graphs, regression models, elementwise probes of vectors/machines, random projections ,etc.
closeness notion:exact/partial graph recovery in Hamming, \(l^p\) distances, \(L^P\)(Q)-distances, sup-norm etc.