13 Jan, 2016

Shrinkage & Regularization


Elastic Net

  • Naive Elastic Net does not perform well in practice, because of double penalizing.
  • $\beta(\text{enet}) = (1+\lambda_2)~\beta(\text{naieve enet})$, the same $\lambda_2$ as previously used.
  • enet package in R

Adaptive Lasso (AL)

  • for $p \gt n$, $\hat\beta_j$ is the OLS estimate for $y = \beta_0 + \beta_j x_j + \epsilon$, $\epsilon \sim N(0,\sigma^2)$, for $j = 1,…,p$. No need to do Bonferroni correction. AL converges to $\beta$ in optimal rate $n^{-1/2}$. $\nu = 2$.
  • polywog package in R
  • always better than lasso. But still has the lasso problems
    • can’t get more than $n$ covariates
    • can’t group variables.
  • Idea:
    • First prescreen covariates. If univariate logistic regression coefficients are insignificant, don’t include in adapted lasso.
    • let $x = x_1, … x_p$ be the post-screened coefficients
    • let $x^{*} = \frac{x_1}{|\hat\beta_1^{ols}|},…,\frac{x_p}{|\hat\beta_p^{ols}|}$
    • Run Lasso with: $\text{argmin}_\beta \norm{y-x^*\beta}_2 + \lambda\norm{\beta}_1$
    • $\hat\beta_{AL} = \frac{\hat\beta_j^{lasso}}{|\hat\beta_j^{ols}|^\nu}$, typically $\nu = 2$.

Group Lasso

Conclusion:

  • there is NO uniformly better method. Have to use the right penalties for the right situations.

Nonnegative Matrix Factorization

  • SNP: 0,1,2 = aa, Aa, AA
  • clustering observations, then regression (elastic net). When $p$ is small, they use k-means; but this is lousy when $p$ is large.