Naive Elastic Net does not perform well in practice, because of double penalizing.
$\beta(\text{enet}) = (1+\lambda_2)~\beta(\text{naieve enet})$, the same $\lambda_2$ as previously used.
enet package in R
Adaptive Lasso (AL)
for $p \gt n$, $\hat\beta_j$ is the OLS estimate for $y = \beta_0 + \beta_j x_j + \epsilon$, $\epsilon \sim N(0,\sigma^2)$, for $j = 1,…,p$. No need to do Bonferroni correction. AL converges to $\beta$ in optimal rate $n^{-1/2}$. $\nu = 2$.
polywog package in R
always better than lasso. But still has the lasso problems
can’t get more than $n$ covariates
can’t group variables.
Idea:
First prescreen covariates. If univariate logistic regression coefficients are insignificant, don’t include in adapted lasso.
let $x = x_1, … x_p$ be the post-screened coefficients
let $x^{*} = \frac{x_1}{|\hat\beta_1^{ols}|},…,\frac{x_p}{|\hat\beta_p^{ols}|}$
Run Lasso with: $\text{argmin}_\beta \norm{y-x^*\beta}_2 + \lambda\norm{\beta}_1$
$\hat\beta_{AL} = \frac{\hat\beta_j^{lasso}}{|\hat\beta_j^{ols}|^\nu}$, typically $\nu = 2$.
Group Lasso
Conclusion:
there is NO uniformly better method. Have to use the right penalties for the right situations.
Nonnegative Matrix Factorization
SNP: 0,1,2 = aa, Aa, AA
clustering observations, then regression (elastic net). When $p$ is small, they use k-means; but this is lousy when $p$ is large.