controlled_geneset_enrichment tests whether a functional geneset is still enriched in a disease gene set after controlling for the disease geneset's enrichment in a particular cell type (the 'controlledCT')

controlled_geneset_enrichment(
  disease_genes,
  functional_genes,
  bg_genes,
  sct_data,
  annotLevel,
  reps,
  controlledCT
)

Arguments

disease_genes

Array of gene symbols containing the disease gene list. Does not have to be disease genes. Must be from same species as the single cell transcriptome dataset.

functional_genes

Array of gene symbols containing the functional gene list. The enrichment of this geneset within the disease_genes is tested. Must be from same species as the single cell transcriptome dataset.

bg_genes

Array of gene symbols containing the background gene list.

sct_data

List generated using generate_celltype_data

annotLevel

an integer indicating which level of the annotation to analyse. Default = 1.

reps

Number of random gene lists to generate (default=100 but should be over 10000 for publication quality results)

controlledCT

(optional) If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type

Value

A list containing three data frames:

  • p_controlled The probability that functional_genes are enriched in disease_genes while controlling for the level of specificity in controlledCT

  • z_controlled The z-score that functional_genes are enriched in disease_genes while controlling for the level of specificity in controlledCT

  • p_uncontrolled The probability that functional_genes are enriched in disease_genes WITHOUT controlling for the level of specificity in controlledCT

  • z_uncontrolled The z-score that functional_genes are enriched in disease_genes WITHOUT controlling for the level of specificity in controlledCT

  • reps=reps

  • controlledCT

  • actualOverlap=actual The number of genes that overlap between functional and disease gene sets

Examples

library(ewceData) # See the vignette for more detailed explanations # Gene set enrichment analysis controlling for cell type expression # set seed for bootstrap reproducibility set.seed(12345678) ctd <- ctd()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
mouse_to_human_homologs <- mouse_to_human_homologs()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
m2h = unique(mouse_to_human_homologs[,c("HGNC.symbol","MGI.symbol")]) schiz_genes <- schiz_genes()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
id_genes <- id_genes()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
mouse.hits.schiz = unique(m2h[m2h$HGNC.symbol %in% schiz_genes,"MGI.symbol"]) mouse.bg = unique(m2h$MGI.symbol) hpsd_genes <- hpsd_genes()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
mouse.hpsd = unique(m2h[m2h$HGNC.symbol %in% hpsd_genes,"MGI.symbol"]) # Use 3 bootstrap lists for speed, for publishable analysis use >10000 reps=3 res_hpsd_schiz = controlled_geneset_enrichment(disease_genes=mouse.hits.schiz, functional_genes = mouse.hpsd, bg_genes = mouse.bg, sct_data = ctd, annotLevel = 1, reps=reps, controlledCT="pyramidal CA1")