vignettes/EWCE.Rmd
EWCE.Rmd
The EWCE
R package is designed to facilitate expression
weighted cell type enrichment analysis as described in our Frontiers
in Neuroscience paper1.
EWCE
can be applied to any gene list.
Using EWCE
essentially involves two steps:
ewceData
(which comes with
EWCE
).bootstrap_enrichment_test
function.NOTE: This documentation is for the development
version of EWCE
. See Bioconductor
for documentation on the current release version.
Load a CTD previously generated from mouse cortex and hypothalamus single-cell RNA-seq data (from the Karolinska Institute).
Each level of a CTD corresponds to increasingly refined
cell-type/-subtype annotations. For example, in the CTD
ewceData::ctd()
level 1 includes the cell-type
“interneurons”, while level 2 breaks these this group into 16 different
interneuron subtypes (“Int…”).
ctd <- ewceData::ctd()
## see ?ewceData and browseVignettes('ewceData') for documentation
## loading from cache
Gene lists input into EWCE can comes from any source (e.g. GWAS, candidate genes, pathways).
Here, we provide an example gene list of Alzheimer’s disease-related nominated from a GWAS.
hits <- ewceData::example_genelist()
## see ?ewceData and browseVignettes('ewceData') for documentation
## loading from cache
print(hits)
## [1] "APOE" "BIN1" "CLU" "ABCA7" "CR1" "PICALM"
## [7] "MS4A6A" "CD33" "MS4A4E" "CD2AP" "EOGA1" "INPP5D"
## [13] "MEF2C" "HLA-DRB5" "ZCWPW1" "NME8" "PTK2B" "CELF1"
## [19] "SORL1" "FERMT2" "SLC24A4" "CASS4"
We now run the cell type enrichment tests on the gene list. Since the
CTD is from mouse data (and is annotated using mouse genes) we specify
the argument sctSpecies="mouse"
.
bootstrap_enrichment_test
will automaticlaly convert the
mouse genes to human genes.
Since the gene list came from GWAS in humans, we set
genelistSpecies="human"
.
Note: We set the seed at the top of this vignette to ensure reproducibility in the bootstrap sampling function.
Note: We use 100 repetitions here for the purposes of a
quick example, but in practice you would want to use
reps=10000
for publishable results.
You can now speed up the bootstrapping process by parallelising
across multiple cores with the parameter no_cores
(=1
by default).
reps <- 100
annotLevel <- 1
full_results <- EWCE::bootstrap_enrichment_test(sct_data = ctd,
sctSpecies = "mouse",
genelistSpecies = "human",
hits = hits,
reps = reps,
annotLevel = annotLevel)
## 1 core(s) assigned as workers (3 reserved).
## Generating gene background for mouse x human ==> human
## Gathering ortholog reports.
## Retrieving all genes using: homologene.
## Retrieving all organisms available in homologene.
## Mapping species name: human
## Common name mapping found for human
## 1 organism identified from search: 9606
## Gene table with 19,129 rows retrieved.
## Returning all 19,129 genes from human.
## Retrieving all genes using: homologene.
## Retrieving all organisms available in homologene.
## Mapping species name: mouse
## Common name mapping found for mouse
## 1 organism identified from search: 10090
## Gene table with 21,207 rows retrieved.
## Returning all 21,207 genes from mouse.
## --
## --
## Preparing gene_df.
## data.frame format detected.
## Extracting genes from Gene.Symbol.
## 21,207 genes extracted.
## Converting mouse ==> human orthologs using: homologene
## Retrieving all organisms available in homologene.
## Mapping species name: mouse
## Common name mapping found for mouse
## 1 organism identified from search: 10090
## Retrieving all organisms available in homologene.
## Mapping species name: human
## Common name mapping found for human
## 1 organism identified from search: 9606
## Checking for genes without orthologs in human.
## Extracting genes from input_gene.
## 17,355 genes extracted.
## Extracting genes from ortholog_gene.
## 17,355 genes extracted.
## Checking for genes without 1:1 orthologs.
## Dropping 131 genes that have multiple input_gene per ortholog_gene (many:1).
## Dropping 498 genes that have multiple ortholog_gene per input_gene (1:many).
## Filtering gene_df with gene_map
## Adding input_gene col to gene_df.
## Adding ortholog_gene col to gene_df.
##
## =========== REPORT SUMMARY ===========
## Total genes dropped after convert_orthologs :
## 4,725 / 21,207 (22%)
## Total genes remaining after convert_orthologs :
## 16,482 / 21,207 (78%)
## --
##
## =========== REPORT SUMMARY ===========
## 16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion.
## 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion.
## Gathering ortholog reports.
## Retrieving all genes using: homologene.
## Retrieving all organisms available in homologene.
## Mapping species name: human
## Common name mapping found for human
## 1 organism identified from search: 9606
## Gene table with 19,129 rows retrieved.
## Returning all 19,129 genes from human.
## Retrieving all genes using: homologene.
## Retrieving all organisms available in homologene.
## Mapping species name: human
## Common name mapping found for human
## 1 organism identified from search: 9606
## Gene table with 19,129 rows retrieved.
## Returning all 19,129 genes from human.
## --
##
## =========== REPORT SUMMARY ===========
## 19,129 / 19,129 (100%) target_species genes remain after ortholog conversion.
## 19,129 / 19,129 (100%) reference_species genes remain after ortholog conversion.
## 16,482 intersect background genes used.
## Standardising CellTypeDataset
## Checking gene list inputs.
## Running without gene size control.
## 17 hit gene(s) remain after filtering.
## Computing gene scores.
## Using previously sampled genes.
## Computing gene counts.
## Testing for enrichment in 7 cell types...
## Sorting results by p-value.
## Computing BH-corrected q-values.
## 1 significant cell type enrichment results @ q<0.05 :
## CellType annotLevel p fold_change sd_from_mean q
## 1 microglia 1 0 1.965915 3.938119 0
The main table of results is stored in
full_results$results
.
In this case, microglia were the only cell type that was significantly enriched in the Alzheimer’s disease gene list.
knitr::kable(full_results$results)
CellType | annotLevel | p | fold_change | sd_from_mean | q | |
---|---|---|---|---|---|---|
microglia | microglia | 1 | 0.00 | 1.9659148 | 3.9381188 | 0.000 |
astrocytes_ependymal | astrocytes_ependymal | 1 | 0.13 | 1.2624889 | 1.1553910 | 0.455 |
pyramidal_SS | pyramidal_SS | 1 | 0.80 | 0.8699242 | -0.8226268 | 1.000 |
oligodendrocytes | oligodendrocytes | 1 | 0.87 | 0.7631149 | -1.0861761 | 1.000 |
pyramidal_CA1 | pyramidal_CA1 | 1 | 0.89 | 0.8202496 | -1.1738063 | 1.000 |
endothelial_mural | endothelial_mural | 1 | 0.90 | 0.7674534 | -1.1811797 | 1.000 |
interneurons | interneurons | 1 | 1.00 | 0.4012954 | -3.4703413 | 1.000 |
The results can be visualised using another function, which shows for each cell type, the number of standard deviations from the mean the level of expression was found to be in the target gene list, relative to the bootstrapped mean.
The dendrogram at the top shows how the cell types are hierarchically clustered by transcriptional similarity.
plot_list <- EWCE::ewce_plot(total_res = full_results$results,
mtc_method = "BH",
ctd = ctd)
print(plot_list$withDendro)
## NULL
ewce is now available via DockerHub as a containerised environment with Rstudio and all necessary dependencies pre-installed.
First, install Docker if you have not already.
Create an image of the Docker container in command line:
docker pull neurogenomicslab/ewce
Once the image has been created, you can launch it with:
docker run \
-d \
-e ROOT=true \
-e PASSWORD=bioc \
-v ~/Desktop:/Desktop \
-v /Volumes:/Volumes \
-p 8787:8787 \
neurogenomicslab/ewce
-d
ensures the container will run in “detached”
mode, which means it will persist even after you’ve closed your command
line session.-e PASSWORD=...
flag.If you are using a system that does not allow Docker (as is the case for many institutional computing clusters), you can instead install Docker images via Singularity.
singularity pull docker://neurogenomicslab/ewce
Finally, launch the containerised Rstudio by entering the following URL in any web browser: http://localhost:8787/
Login using the credentials set during the Installation steps.
utils::sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ewceData_1.13.0 ExperimentHub_2.13.1 AnnotationHub_3.13.3
## [4] BiocFileCache_2.13.2 dbplyr_2.5.0 BiocGenerics_0.51.3
## [7] EWCE_1.15.0 RNOmni_1.0.1.2 BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.9 magrittr_2.0.3
## [3] farver_2.1.2 rmarkdown_2.28
## [5] fs_1.6.4 zlibbioc_1.51.2
## [7] ragg_1.3.3 vctrs_0.6.5
## [9] memoise_2.0.1 ggtree_3.13.2
## [11] rstatix_0.7.2 htmltools_0.5.8.1
## [13] S4Arrays_1.5.11 curl_5.2.3
## [15] broom_1.0.7 SparseArray_1.5.45
## [17] Formula_1.2-5 gridGraphics_0.5-1
## [19] sass_0.4.9 bslib_0.8.0
## [21] htmlwidgets_1.6.4 desc_1.4.3
## [23] plyr_1.8.9 HGNChelper_0.8.14
## [25] plotly_4.10.4 cachem_1.1.0
## [27] mime_0.12 lifecycle_1.0.4
## [29] pkgconfig_2.0.3 Matrix_1.7-1
## [31] R6_2.5.1 fastmap_1.2.0
## [33] GenomeInfoDbData_1.2.13 MatrixGenerics_1.17.1
## [35] digest_0.6.37 aplot_0.2.3
## [37] colorspace_2.1-1 patchwork_1.3.0
## [39] AnnotationDbi_1.67.0 S4Vectors_0.43.2
## [41] grr_0.9.5 textshaping_0.4.0
## [43] GenomicRanges_1.57.2 RSQLite_2.3.7
## [45] ggpubr_0.6.0 labeling_0.4.3
## [47] filelock_1.0.3 fansi_1.0.6
## [49] httr_1.4.7 abind_1.4-8
## [51] compiler_4.4.1 withr_3.0.2
## [53] bit64_4.5.2 backports_1.5.0
## [55] BiocParallel_1.39.0 orthogene_1.11.0
## [57] carData_3.0-5 DBI_1.2.3
## [59] homologene_1.4.68.19.3.27 highr_0.11
## [61] ggsignif_0.6.4 rappdirs_0.3.3
## [63] DelayedArray_0.31.14 tools_4.4.1
## [65] splitstackshape_1.4.8 ape_5.8
## [67] glue_1.8.0 nlme_3.1-166
## [69] grid_4.4.1 reshape2_1.4.4
## [71] generics_0.1.3 gtable_0.3.6
## [73] tidyr_1.3.1 data.table_1.16.2
## [75] car_3.1-3 utf8_1.2.4
## [77] XVector_0.45.0 stringr_1.5.1
## [79] BiocVersion_3.20.0 pillar_1.9.0
## [81] yulab.utils_0.1.7 babelgene_22.9
## [83] limma_3.61.12 dplyr_1.1.4
## [85] treeio_1.29.2 lattice_0.22-6
## [87] bit_4.5.0 tidyselect_1.2.1
## [89] SingleCellExperiment_1.27.2 Biostrings_2.73.2
## [91] knitr_1.48 bookdown_0.41
## [93] IRanges_2.39.2 SummarizedExperiment_1.35.5
## [95] stats4_4.4.1 xfun_0.48
## [97] Biobase_2.65.1 statmod_1.5.0
## [99] matrixStats_1.4.1 stringi_1.8.4
## [101] UCSC.utils_1.1.0 lazyeval_0.2.2
## [103] ggfun_0.1.7 yaml_2.3.10
## [105] codetools_0.2-20 evaluate_1.0.1
## [107] tibble_3.2.1 BiocManager_1.30.25
## [109] ggplotify_0.1.2 cli_3.6.3
## [111] systemfonts_1.1.0 munsell_0.5.1
## [113] jquerylib_0.1.4 Rcpp_1.0.13
## [115] GenomeInfoDb_1.41.2 gprofiler2_0.2.3
## [117] png_0.1-8 parallel_4.4.1
## [119] pkgdown_2.1.1 ggplot2_3.5.1
## [121] blob_1.2.4 viridisLite_0.4.2
## [123] tidytree_0.4.6 scales_1.3.0
## [125] purrr_1.0.2 crayon_1.5.3
## [127] rlang_1.1.4 KEGGREST_1.45.1