Package 'regnet'

Title: Network-Based Regularization for Generalized Linear Models
Description: Network-based regularization has achieved success in variable selection for high-dimensional biological data due to its ability to incorporate correlations among genomic features. This package provides procedures of network-based variable selection for generalized linear models (Ren et al. (2017) <doi:10.1186/s12863-017-0495-5> and Ren et al.(2019) <doi:10.1002/gepi.22194>). Continuous, binary, and survival response are supported. Robust network-based methods are available for continuous and survival responses.
Authors: Jie Ren, Luann C. Jung, Yinhao Du, Cen Wu, Yu Jiang, Junhao Liu
Maintainer: Jie Ren <[email protected]>
License: GPL-2
Version: 1.0.1
Built: 2024-11-18 04:54:25 UTC
Source: https://github.com/jrhub/regnet

Help Index


regnet: Network-Based Regularization for Generalized Linear Models

Description

Network-based regularization has achieved success in variable selection for high-dimensional biological data due to its ability to incorporate correlations among genomic features. This package provides procedures of network-based variable selection for generalized linear models (Ren et al. (2017) doi:10.1186/s12863-017-0495-5 and Ren et al.(2019) doi:10.1002/gepi.22194). Continuous, binary, and survival response are supported. Robust network-based methods are available for continuous and survival responses.

This package provides the implementation of the network-based variable selection method in Ren et al (2017) and the robust network-based method in Ren et al (2019). In addition to the network penalty, regnet allows users to use classical LASSO and MCP penalties.

Details

Two easy-to-use, integrated interfaces, cv.regnet() and regnet() allow users to flexibly choose the method that they want to use. There are three arguments to control the fitting method

response: three types of response are supported: "binary", "continuous"
and "survival".
penalty: three choices of the penalty functions are available: "network",
"mcp" and "lasso".
robust: whether to use robust methods for modeling. Robust methods
are available for survival and continuous responses.

In penalized regression, the tuning parameter λ1\lambda_{1} controls the sparsity of the coefficient profile. For network-based methods, an additional tuning parameter λ2\lambda_{2} is needed for controlling the smoothness among coefficients. Typical usage of the package is to have the cv.regnet() compute the optimal values of lambdas, then provide them to the regnet() function for estimating the coefficients.

If the users want to include clinical variables that are not subject to the penalty in the model, the argument 'clv' can be used to indicate the positions of clinical variables in the X matrix. e.g. 'clv=(1:5)' meaning that the first five variables in X will not be penalized. It is recommended to put the clinical variables at the beginning of the X matrix in a contiguous way (see the 'Value' section of the regnet() function). However, non-contiguous indices, e.g. 'clv=(2,4,6)', are also allowed.

References

Ren, J., Du, Y., Li, S., Ma, S., Jiang,Y. and Wu, C. (2019). Robust network-based regularization and variable selection for high dimensional genomics data in cancer prognosis. Genet. Epidemiol., 43:276-291 doi:10.1002/gepi.22194

Wu, C., Zhang, Q., Jiang,Y. and Ma, S. (2018). Robust network-based analysis of the associations between (epi)genetic measurements. J Multivar Anal., 168:119-130 doi:10.1016/j.jmva.2018.06.009

Wu, C., Jiang, Y., Ren, J., Cui, Y. and Ma, S. (2018). Dissecting gene-environment interactions: A penalized robust approach accounting for hierarchical structures. Statistics in Medicine, 37:437–456 doi:10.1002/sim.7518

Ren, J., He, T., Li, Y., Liu, S., Du, Y., Jiang, Y., and Wu, C. (2017). Network-based regularization for high dimensional SNP data in the case-control study of Type 2 diabetes. BMC Genetics, 18(1):44 doi:10.1186/s12863-017-0495-5

Wu, C., and Ma, S. (2015). A selective review of robust variable selection with applications in bioinformatics. Briefings in Bioinformatics, 16(5), 873–883 doi:10.1093/bib/bbu046

Wu, C., Shi, X., Cui, Y. and Ma, S. (2015). A penalized robust semiparametric approach for gene-environment interactions. Statistics in Medicine, 34 (30): 4016–4030 doi:10.1002/sim.6609

See Also

Useful links:

cv.regnet regnet

Examples

## Survival response using robust network method
data(SurvExample)
X = rgn.surv$X
Y = rgn.surv$Y
clv = c(1:5) # variables 1 to 5 are treated as clinical variables, we choose not to penalize them.
out = cv.regnet(X, Y, response="survival", penalty="network", clv=clv, robust=TRUE, verbo = TRUE)
out$lambda

fit = regnet(X, Y, "survival", "network", out$lambda[1,1], out$lambda[1,2], clv=clv, robust=TRUE)
index = which(rgn.surv$beta[-(1:6)] != 0)  # [-(1:6)] removes the intercept and clinical variables
pos = which(fit$coeff[-(1:6)] != 0)
tp = length(intersect(index, pos))
fp = length(pos) - tp
list(tp=tp, fp=fp)

k-folds cross-validation for regnet

Description

This function does k-fold cross-validation for regnet and returns the optimal value(s) of lambda.

Usage

cv.regnet(
  X,
  Y,
  response = c("binary", "continuous", "survival"),
  penalty = c("network", "mcp", "lasso"),
  lamb.1 = NULL,
  lamb.2 = NULL,
  folds = 5,
  r = NULL,
  clv = NULL,
  initiation = NULL,
  alpha.i = 1,
  robust = FALSE,
  verbo = FALSE,
  debugging = FALSE
)

Arguments

X

X matrix without intercept (see regnet).

Y

the response variable Y (see regnet).

response

the response type. regnet supports three types of response: "binary", "continuous" and "survival".

penalty

the penalty type. regnet provides three choices for the penalty function: "network", "mcp" and "lasso".

lamb.1

a user-supplied sequence of λ1\lambda_{1} values, which serves as a tuning parameter to impose sparsity. If it is left as NULL, regnet will compute its own sequence.

lamb.2

a user-supplied sequence of λ2\lambda_{2} values for network method. λ2\lambda_{2} controls the smoothness among coefficient profiles. If it is left as NULL, a default sequence will be used.

folds

the number of folds for cross-validation; the default is 5.

r

the regularization parameter in MCP; default is 5. For binary response, r should be larger than 4.

clv

a value or a vector, indexing variables that are not subject to penalty. clv only works for continuous and survival responses in the current version of regnet, and will be ignored for other types of responses.

initiation

the method for initiating the coefficient vector. The default method is elastic-net.

alpha.i

the elastic-net mixing parameter. The program can use the elastic-net for choosing initial values of the coefficient vector. alpha.i is the elastic-net mixing parameter, with 0 \le alpha.i \le 1. alpha.i=1 is the lasso penalty, and alpha.i=0 is the ridge penalty. If the user chooses a method other than elastic-net for initializing coefficients, alpha.i will be ignored.

robust

a logical flag. Whether or not to use robust methods. Robust methods are available for survival and continuous response.

verbo

output progress to the console.

debugging

a logical flag. If TRUE, extra information will be returned.

Details

When lamb.1 is left as NULL, regnet computes its own sequence. You can find the lamb.1 sequence used by the program in the returned CVM matrix (see the 'Value' section). If you find the default sequence does not work well, you can try (1) standardizing the response vector Y; or (2) providing a customized lamb.1 sequence for your data.

Sometimes multiple optimal values(pairs) of lambda(s) can be found (see 'Value'). This is usually normal when the response is binary. However, if the response is survival or continuous, you may want to check (1) if the sequence of lambda is too large (i.e. all coefficients are shrunken to zero under all values of lambda) ; or (2) if the sequence is too small (i.e. all coefficients are non-zero under all values of lambda). If neither, simply choose the value(pair) of lambda based on your preference.

Value

an object of class "cv.regnet" is returned, which is a list with components:

lambda

the optimal value(s) of λ\lambda. More than one value will be returned, if multiple lambdas have the cross-validated error = min(cross-validated errors). If the network penalty is used, lambda contains optimal pair(s) of λ1\lambda_{1} and λ2\lambda_{2}.

mcvm

the cross-validated error of the optimal λ\lambda. For binary response, the error is the misclassification rate. For continuous response, mean squared error (MSE) is used. For survival response, the MSE is used for non-robust methods, and the criterion for robust methods is the least absolute deviation (LAD).

CVM

a matrix of the mean cross-validated errors of all lambdas used in the fits. The row names of CVM are the values of λ1\lambda_{1}. If the network penalty was used, the column names are the values of λ2\lambda_{2}.

References

Ren, J., Du, Y., Li, S., Ma, S., Jiang,Y. and Wu, C. (2019). Robust network-based regularization and variable selection for high dimensional genomics data in cancer prognosis. Genet. Epidemiol., 43:276-291 doi:10.1002/gepi.22194

Ren, J., He, T., Li, Y., Liu, S., Du, Y., Jiang, Y., and Wu, C. (2017). Network-based regularization for high dimensional SNP data in the case-control study of Type 2 diabetes. BMC Genetics, 18(1):44 doi:10.1186/s12863-017-0495-5

See Also

regnet

Examples

## Binary response using network method
data(LogisticExample)
X = rgn.logi$X
Y = rgn.logi$Y
out = cv.regnet(X, Y, response="binary", penalty="network", folds=5, r = 4.5)
out$lambda
fit = regnet(X, Y, "binary", "network", out$lambda[1,1], out$lambda[1,2], r = 4.5)
index = which(rgn.logi$beta != 0)
pos = which(fit$coeff != 0)
tp = length(intersect(index, pos))
fp = length(pos) - tp
list(tp=tp, fp=fp)

## Binary response using MCP method
out = cv.regnet(X, Y, response="binary", penalty="mcp", folds=5, r = 4.5)
out$lambda
fit = regnet(X, Y, "binary", "mcp", out$lambda[1], r = 4.5)
index = which(rgn.logi$beta != 0)
pos = which(fit$coeff != 0)
tp = length(intersect(index, pos))
fp = length(pos) - tp
list(tp=tp, fp=fp)

plot a regnet object

Description

plot the network structures of the identified genetic variants.

Usage

## S3 method for class 'regnet'
plot(x, subnetworks=FALSE, vsize=10, labelDist=2, minVertices=2, theta=5, ...)

Arguments

x

a regnet object.

subnetworks

whether to plot sub-networks

vsize

the size of the vertex

labelDist

the distance of the label from the center of the vertex.

minVertices

the minimum number of vertices a sub-network should contain.

theta

the multiplier for the width of the edge. Specifically, edge.width=θ×adjacencyedge.width=\theta\times adjacency. The default is 5.

...

other plot arguments

Details

This function depends on the "igraph" package in generating the network graphs. It returns a (list of) igraph object(s), on which users can do further modification on the network graphs.

Value

an object of class "igraph" is returned in default. When subnetworks=TRUE, a list of "igraph" objects (sub-networks) is returned.

See Also

regnet

Examples

data(ContExample)
X = rgn.tcga$X
Y = rgn.tcga$Y
clv = (1:2)
fit = regnet(X, Y, "continuous", "network", rgn.tcga$lamb1, rgn.tcga$lamb2, clv =clv, alpha.i=0.5)

plot(fit)
plot(fit, subnetworks = TRUE, vsize=20, labelDist = 3, theta = 5)

print a cv.regnet object

Description

Print a summary of a cv.regnet object

Usage

## S3 method for class 'cv.regnet'
print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

x

a cv.regnet object.

digits

significant digits in the printout.

...

other print arguments

See Also

cv.regnet


print a regnet object

Description

Print a summary of a regnet object

Usage

## S3 method for class 'regnet'
print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

x

a regnet object.

digits

significant digits in the printout.

...

other print arguments

See Also

regnet


fit a regression for given lambda with network-based regularization

Description

Network-based penalization regression for given values of λ1\lambda_{1} and λ2\lambda_{2}. Typical usage is to have the cv.regnet function compute the optimal lambdas, then provide them to the regnet function. Users could also use MCP or Lasso.

Usage

regnet(
  X,
  Y,
  response = c("binary", "continuous", "survival"),
  penalty = c("network", "mcp", "lasso"),
  lamb.1 = NULL,
  lamb.2 = NULL,
  r = NULL,
  clv = NULL,
  initiation = NULL,
  alpha.i = 1,
  robust = FALSE,
  debugging = FALSE
)

Arguments

X

a matrix of predictors without intercept. Each row should be an observation vector. A column of 1 will be added to the X matrix by the program as the intercept.

Y

the response variable. For response="binary", Y should be a numeric vector with zeros and ones. For response="survival", Y should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating an event, and '0' indicating censoring.

response

the response type. regnet supports three types of response: "binary", "continuous" and "survival".

penalty

the penalty type. regnet provides three choices for the penalty function: "network", "mcp" and "lasso".

lamb.1

the tuning parameter λ1\lambda_{1} that imposes sparsity.

lamb.2

the tuning parameter λ2\lambda_{2} that controls the smoothness among coefficient profiles. λ2\lambda_{2} is needed for network penalty.

r

the regularization parameter in MCP. For binary response, r should be larger than 4.

clv

a value or a vector, indexing variables that are not subject to penalty. clv only works for continuous and survival responses for now, and will be ignored for other types of responses.

initiation

the method for initiating the coefficient vector. The default method is elastic-net.

alpha.i

the elastic-net mixing parameter. The program can use the elastic-net for choosing initial values of the coefficient vector. alpha.i is the elastic-net mixing parameter, with 0 \le alpha.i \le 1. alpha.i=1 is the lasso penalty, and alpha.i=0 is the ridge penalty. If the user chooses a method other than elastic-net for initializing coefficients, alpha.i will be ignored.

robust

a logical flag. Whether or not to use robust methods. Robust methods are available for survival and continuous response.

debugging

a logical flag. If TRUE, extra information will be returned.

Details

The current version of regnet supports three types of responses: “binary”, "continuous" and “survival”.

  • regnet(…, response="binary", penalty="network") fits a network-based penalized logistic regression.

  • regnet(…, response="continuous", penalty="network") fits a network-based least square regression.

  • regnet(…, response="survival", penalty="network", robust=TRUE) fits a robust regularized AFT model using network penalty.

By default, regnet uses non-robust methods for all types of responses. To use robust methods, simply set robust=TRUE. It is recommended to use robust methods for survival response. Please see the references for more details about the models. Users could also use MCP or Lasso penalty.

The coefficients are always estimated on a standardized X matrix. regnet standardizes each column of X to have unit variance (using 1/n rather than 1/(n-1) formula). If the coefficients on the original scale are needed, the user can refit a standard model using the subset of variables that have non-zero coefficients.

Value

an object of class "regnet" is returned, which is a list with components:

  • coeff: a vector of estimated coefficients. Please note that, if there are variables not subject to penalty (indicated by clv), the order of returned vector is c(Intercept, unpenalized coefficients of clv variables, penalized coefficients of other variables).

  • Adj: a matrix of adjacency measures of the identified genetic variants. Identified genetic variants are those that have non-zero estimated coefficients.

References

Ren, J., He, T., Li, Y., Liu, S., Du, Y., Jiang, Y., and Wu, C. (2017). Network-based regularization for high dimensional SNP data in the case-control study of Type 2 diabetes. BMC Genetics, 18(1):44 doi:10.1186/s12863-017-0495-5

Ren, J., Du, Y., Li, S., Ma, S., Jiang,Y. and Wu, C. (2019). Robust network-based regularization and variable selection for high dimensional genomics data in cancer prognosis. Genet. Epidemiol., 43:276-291 doi:10.1002/gepi.22194

See Also

cv.regnet

Examples

## Survival response
data(SurvExample)
X = rgn.surv$X
Y = rgn.surv$Y
clv = c(1:5) # variables 1 to 5 are clinical variables which we choose not to penalize.
penalty = "network"
fit = regnet(X, Y, "survival", penalty, rgn.surv$lamb1, rgn.surv$lamb2, clv=clv, robust=TRUE)
index = which(rgn.surv$beta != 0)
pos = which(fit$coeff != 0)
tp = length(intersect(index, pos))
fp = length(pos) - tp
list(tp=tp, fp=fp)

Example datasets for demonstrating the features of regnet

Description

Example datasets for demonstrating the features of regnet.

Usage

data("LogisticExample")
data("SurvExample")
data("ContExample")
data("HeteroExample")

Format

"LogisticExample", "SurvExample" and "HeteroExample" are simulated data. Each data includes three main components: X, Y, and beta; beta is a vector of the true coefficients used to generate Y.

"ContExample" is a subset of the skin cutaneous melanoma data from the Cancer Genome Atlas (TCGA). The response variable Y is the log-transformed Breslow’s depth. X is a matrix of gene expression data.

Examples

data("LogisticExample")
lapply(rgn.logi, class)