controlled_geneset_enrichment tests whether a functional geneset is still enriched in a disease gene set after controlling for the disease geneset's enrichment in a particular cell type (the 'controlledCT')

controlled_geneset_enrichment(
disease_genes,
functional_genes,
bg_genes,
sct_data,
annotLevel,
reps,
controlledCT
)

## Arguments

disease_genes Array of gene symbols containing the disease gene list. Does not have to be disease genes. Must be from same species as the single cell transcriptome dataset. Array of gene symbols containing the functional gene list. The enrichment of this geneset within the disease_genes is tested. Must be from same species as the single cell transcriptome dataset. Array of gene symbols containing the background gene list. List generated using generate_celltype_data an integer indicating which level of the annotation to analyse. Default = 1. Number of random gene lists to generate (default=100 but should be over 10000 for publication quality results) (optional) If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type

## Value

A list containing three data frames:

• p_controlled The probability that functional_genes are enriched in disease_genes while controlling for the level of specificity in controlledCT

• z_controlled The z-score that functional_genes are enriched in disease_genes while controlling for the level of specificity in controlledCT

• p_uncontrolled The probability that functional_genes are enriched in disease_genes WITHOUT controlling for the level of specificity in controlledCT

• z_uncontrolled The z-score that functional_genes are enriched in disease_genes WITHOUT controlling for the level of specificity in controlledCT

• reps=reps

• controlledCT

• actualOverlap=actual The number of genes that overlap between functional and disease gene sets

## Examples

library(ewceData)
# See the vignette for more detailed explanations
# Gene set enrichment analysis controlling for cell type expression
# set seed for bootstrap reproducibility
set.seed(12345678)
ctd <- ctd()
#> see ?ewceData and browseVignettes('ewceData') for documentation#> loading from cachemouse_to_human_homologs <- mouse_to_human_homologs()
#> see ?ewceData and browseVignettes('ewceData') for documentation#> loading from cachem2h = unique(mouse_to_human_homologs[,c("HGNC.symbol","MGI.symbol")])
schiz_genes <- schiz_genes()
#> see ?ewceData and browseVignettes('ewceData') for documentation#> loading from cacheid_genes <- id_genes()
#> see ?ewceData and browseVignettes('ewceData') for documentation#> loading from cachemouse.hits.schiz = unique(m2h[m2h$HGNC.symbol %in% schiz_genes,"MGI.symbol"]) mouse.bg = unique(m2h$MGI.symbol)
hpsd_genes <- hpsd_genes()
#> see ?ewceData and browseVignettes('ewceData') for documentation#> loading from cachemouse.hpsd = unique(m2h[m2h\$HGNC.symbol %in% hpsd_genes,"MGI.symbol"])
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps=3
res_hpsd_schiz =
controlled_geneset_enrichment(disease_genes=mouse.hits.schiz,
functional_genes = mouse.hpsd,
bg_genes = mouse.bg,
sct_data = ctd, annotLevel = 1,
reps=reps,
controlledCT="pyramidal CA1")