SpiderLearner Quick Start
Installing and Loading the ensembleGGM
Package
Begin by installing and loading the devtools
package, then using the install_github
function to install ensembleGGM
from GitHub as follows:
Loading Example Data
Next, load the example data. Note that you will need to install the affy
and curatedOvarianData
packages from Bioconductor.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
# Uncomment this line to install the affy and curatedOvarianData package
# Takes time, so re-comment after
# BiocManager::install("affy")
# BiocManager::install("curatedOvarianData")
library(affy)
library(curatedOvarianData)
Extracting and Standardizing Example Data
For illustration, we extract an example dataset (GSE32062.GPL6480_eset) from the curatedOvarianCancer
package, and select a subset of genes related to ovarian carcinoma based on the Human Phenotype Ontology (Köhler et al. 2021):
standardize = function(x){return((x-mean(x))/sd(x))}
data(GSE32062.GPL6480_eset)
lateStage = exprs(GSE32062.GPL6480_eset)
# Extract a subset of genes related to ovarian carcinoma
# based on the Human Phenotype Ontology
# See https://hpo.jax.org/app/browse/term/HP:0025318
lateStageSmall = lateStage[c(1680,1681,3027,4564,8930,12243,12245,13694,13695,13701,13979,16082,16875,17980),]
lateStageSmall = t(lateStageSmall)
names(lateStageSmall) = colnames(lateStageSmall)
lateStageSmall = apply(lateStageSmall,2,standardize)
head(lateStageSmall)
## BRCA1 BRCA2 CDKN2A DMPK KRAS PALB2
## GSM794865 -0.6028802 -1.1376194 0.2887462 -0.03206431 -0.8508601 -1.3167310
## GSM794866 -0.1237956 0.7974723 -1.0629278 -0.89750567 -0.3283184 0.8708805
## GSM794867 0.9028595 -0.7873664 1.8230450 -0.50683914 -0.2154519 -0.3221758
## GSM794868 1.2158611 -0.5421297 0.6851006 0.04632488 -1.2685259 -0.9095274
## GSM794869 -0.3436371 -0.2762188 0.7401463 -0.22931356 0.6232824 0.5098349
## GSM794870 0.4111565 0.5665514 -1.2593233 1.30307811 0.6981820 -0.3248203
## PALLD PTCH1 PTCH2 PTEN RAD51C
## GSM794865 1.91246451 -0.09995631 0.28531547 0.2503601 -1.19871005
## GSM794866 -1.50509530 1.34329261 0.82345264 -0.8031427 -1.06716226
## GSM794867 -1.83488960 0.31336992 0.26316065 -0.6280457 0.10481385
## GSM794868 -0.00508719 -0.98669751 0.07568094 0.1233805 -0.09129142
## GSM794869 0.20471774 0.35693082 0.37732193 0.3551492 0.02784237
## GSM794870 1.53759831 -0.97097949 -1.22914835 1.1162712 0.17535862
## SMAD4 SUFU TP53
## GSM794865 -0.04516039 0.4075326 0.3881490
## GSM794866 -0.63087621 0.8761307 0.1602356
## GSM794867 -1.04189988 -1.2253780 0.7926124
## GSM794868 0.18237901 0.6599834 1.0135767
## GSM794869 -0.05990108 -0.2091264 0.9350137
## GSM794870 -0.11255515 0.8103267 -0.6199303
Instantiating the SpiderLearner and Adding Candidates
Instantiate a SpiderLearner object with the SpiderLearner$new()
function, and add candidates as desired:
s = SpiderLearner$new()
apple = HugeEBICCandidate$new(gamma = 0)
banana = HugeEBICCandidate$new(gamma = 0.5)
clementine = HugeRICCandidate$new()
date = HGlassoCandidate$new()
elderberry = MLECandidate$new()
fraise = HugeStARSCandidate$new(thres = 0.05)
grape = HugeStARSCandidate$new(thres = 0.1)
honeydew = QGraphEBICCandidate$new(gamma = 0)
icewine = QGraphEBICCandidate$new(gamma = 0.5)
candidates = list(apple,
banana,
clementine,
date,
elderberry,
fraise,
grape,
honeydew,
icewine)
for(candidate in candidates)
{
s$addCandidate(candidate)
}
Running the SpiderLearner
Here is the syntax for running the model. Output is suppressed here for space.
There are two ways to access the results. The first is in the object we’ve saved as slResults
that is returned by the runSpiderLearner
function. The results are also stored in the SpiderLearner object itself and can be accessed with the getResults
function. Note that the results accessed with getResults
will only be the most recent set of results; therefore, if you wish to change your library with addCandidate
or deleteCandidate
and then run the model again, you should save the results as a separate object each time.
## [1] "foldsNets" "fullModels" "optTheta"
## [4] "simpleMeanNetwork" "weights"
## [1] "foldsNets" "fullModels" "optTheta"
## [4] "simpleMeanNetwork" "weights"
Investigating SpiderLearner Results
A good starting point for investigating results is to look at the weights of each candidate method.
## method weight
## 1 ebic_0 7.609961e-01
## 2 ebic_0.5 3.705783e-09
## 3 ric 1.554953e-08
## 4 hglasso 5.487284e-08
## 5 mle 2.261278e-01
## 6 stars_0.05 3.705798e-09
## 7 stars_0.1 3.118281e-09
## 8 qgraph_ebic_0 1.287595e-02
## 9 qgraph_ebic_0.5 4.009458e-08
We can plot the GGM for the SpiderLearner ensemble model using the plotSpiderLearner
function:
We can also plot the GGM corresponding to any of the candidate method using the plotCandidate
function with the method identifier as an argument - for example, here is the MLE:
The adjacency matrix of the estimated GGM can be also accessed with getGGM
:
## BRCA1 BRCA2 CDKN2A DMPK KRAS
## BRCA1 0.000000000 0.0089647641 0.01646969 0.1137168716 0.07360712
## BRCA2 0.008964764 0.0000000000 0.01408362 0.0009387959 0.17306543
## CDKN2A 0.016469693 0.0140836240 0.00000000 0.0360062990 -0.01496000
## DMPK 0.113716872 0.0009387959 0.03600630 0.0000000000 0.03968742
## KRAS 0.073607123 0.1730654308 -0.01496000 0.0396874153 0.00000000
The \(i,j^{th}\) entry in this matrix represents the estimated partial correlation between the \(i^{th}\) and \(j^{th}\) variable in this dataset.1
Running More Ensembles
It is straightforward to make changes in the library, number of folds, or dataset and run SpiderLearner again using the same object. For example, we can remove the hub graphical lasso and the MLE as candidate methods using the syntax:
We can check what’s in our library now with the printLibrary
function:
## [1] "ebic_0" "ebic_0.5" "ric" "stars_0.05"
## [5] "stars_0.1" "qgraph_ebic_0" "qgraph_ebic_0.5"
Finally, we can run our model again. Say that this time, we want to use 5 folds; we can modify that parameter here as well.
Now, when we use the getWeights()
function, we will get the results for our latest analysis:
## method weight
## 1 ebic_0 9.788951e-01
## 2 ebic_0.5 4.260849e-08
## 3 ric 8.836936e-08
## 4 stars_0.05 4.260849e-08
## 5 stars_0.1 5.089110e-08
## 6 qgraph_ebic_0 2.110453e-02
## 7 qgraph_ebic_0.5 1.270060e-07
Contact Us / Contribute
This package is new, and any and all suggestions are welcomed. You can use GitHub to raise issues, contribute, or communicate with us about the package:
https://github.com/katehoffshutta/ensembleGGM
In particular, we would love to add more GGM estimation methods as Candidate
objects and we welcome contributions in that area.
References
Köhler, Sebastian, Michael Gargano, Nicolas Matentzoglu, Leigh C Carmody, David Lewis-Smith, Nicole A Vasilevsky, Daniel Danis, et al. 2021. “The Human Phenotype Ontology in 2021.” Nucleic Acids Research 49 (D1): D1207–D1217.
Rolfs, Benjamin T, and Bala Rajaratnam. 2013. “A Note on the Lack of Symmetry in the Graphical Lasso.” Computational Statistics & Data Analysis 57 (1): 429–34.
Note that there is a known lack of symmetry in the graphical lasso-estimated precision matrix (Rolfs and Rajaratnam 2013), and consequently, in the matrix of partial correlations estimated by SpiderLearner. In the
ensembleGGM
package, we address this by averaging the \(i,j^{th}\) and \(j,i^{th}\) entry of the adjacency matrix to obtain a symmetric matrix, consistent with the fact that partial correlation should be symmetric.↩︎