JdS2012


 English   -  Français  

Résumé de communication



Résumé 290 :

Optimality of Graphlet Screening in High Dimensional Variable Selection
Jiashun, Jin
Carnegie Mellon University

Consider a linear model $Y = X \beta + z$, where $X$ has $n$ rows and $p$ columns and $z \sim N(0, I_n)$. We assume both $p$ and $n$ are large, including the case of $p \gg n$. The unknown signal vector $\beta$ is assumed to be sparse in the sense that only a small fraction of its components is nonzero. The goal is to identify such nonzero coordinates (i.e., variable selection). We are primarily interested in the regime where signals are both {\it rare and weak} so that successful variable selection is challenging but is still possible. Researches on rare and weak signals to date have been focused on the unstructured case, where the Gram matrix $G = X'X$ is nearly orthogonal. In this paper, $G$ is only assumed to be sparse in the sense that each row of $G$ has relatively few large coordinates (diagonals of $G$ are normalized to $1$). The sparsity of $G$ naturally induces the sparsity of the so-called {\it graph of strong dependence} (GOSD). The key insight is that there is an interesting interplay between the signal sparsity and graph sparsity: in a broad context, the signals decompose into many small-size components of GOSD that are disconnected to each other. We propose {\it Graphlet Screening} (GS) for variable selection. This is a two-step Screen and Clean procedure, where in the first step, we screen subgraphs of GOSD with sequential $\chi^2$-tests, and in the second step, we clean with penalized MLE. The main methodological innovation is to use GOSD to guide both the screening and cleaning processes. For any variable selection procedure $\hat{\beta}$, we measure its performance with the Hamming distance between the sign vectors of $\hat{\beta}$ and $\beta$, and assess the optimality by the convergence rate of the Hamming distance. Compared with more stringent criterions such as exact support recovery or oracle property, which demand strong signals, the Hamming distance criterion is more appropriate for weak signals since it naturally allows a small fraction of errors. We show that in a broad class of situations, Graphlet Screening achieves the optimal rate of convergence in terms of the Hamming distance. Well-known procedures such as the $L^0$-penalization method and the $L^1$-penalization methods do not utilize graph structure for variable selection, so they generally do not achieve the optimal rate of convergence, even in very simple settings and even when the tuning parameters are ideally set.