29 Jan, 2016
Compresing Predictors
If parameter estimation is not the main goal, but prediction. We don’t want to use generalized pareto priors.
Goal:
Build a preictive model of $y$ on $\underset{p \times 1}{\pmb x}$, $p \approx 50000$.
$\underset{m\times p}{\Phi} \underset{p\times 1}{\pmb x}$, $m \ll p$. $\underset{m\times 1}{\Phi \pmb x}$
Papers
- k-nearest neighbor clustering
- Image compression
Posterior
Note that the posterior is obtained in closed form, while not in other methods.
How to choose $m$?
- model averaging
- create model for each (specified) dimension of $\Phi$
- $P(\mathcal M_l | D = \frac{P(D|\mathcal M_l)P(\mathcal M_l)}{P(D)}$
- $P(\mathcal M_l) = 1/S$, $S$ is the number of models (i.e. number of $m$’s).
- $P(D|\mathcal M_l) = \int~ N(y|X\Phi’\beta,\sigma^2I)\pi(\beta|\sigma^2)\pi(\sigma^2) ~d\beta d\sigma^2$
- $=\frac{1}{|X\Phi’\Sigma_\beta\Phi X’ + I|^{1/2}} \frac{2^{n/2}\Gamma(n/2)}{y’(X\Phi’\Sigma_beta\Phi X’+I)^{-1}y} (\sqrt{2\pi})^{n/2} $
- Simulate a new $\Phi$ for each model
- we don’t pick one model, but we use weights for each model and predict with each model
Model has narrower credible intervals and lower MSE, compared to lasso and ridge.