Celltype controlled geneset enrichment — controlled_geneset

controlled_geneset_enrichment tests whether a functional gene set is still enriched in a disease gene set after controlling for the disease gene set's enrichment in a particular cell type (the 'controlledCT')

controlled_geneset_enrichment(
  disease_genes,
  functional_genes,
  bg = NULL,
  sct_data,
  sctSpecies = NULL,
  output_species = "human",
  disease_genes_species = NULL,
  functional_genes_species = NULL,
  method = "homologene",
  annotLevel,
  reps = 100,
  controlledCT,
  use_intersect = FALSE,
  verbose = TRUE
)

Arguments

disease_genes

Array of gene symbols containing the disease gene list. Does not have to be disease genes. Must be from same species as the single cell transcriptome dataset.

functional_genes

Array of gene symbols containing the functional gene list. The enrichment of this gene set within the disease_genes is tested. Must be from same species as the single cell transcriptome dataset.

bg

List of gene symbols containing the background gene list (including hit genes). If bg=NULL, an appropriate gene background will be created automatically.

sct_data

List generated using generate_celltype_data.

sctSpecies

Species that sct_data is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.

output_species

Species to convert sct_data and hits to (Default: "human"). See list_species for all available species.

disease_genes_species

Species of the disease_genes gene set.

functional_genes_species

Species of the functional_genes gene set.

method

R package to use for gene mapping:

"gprofiler" : Slower but more species and genes.
"homologene" : Faster but fewer species and genes.
"babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

annotLevel

An integer indicating which level of sct_data to analyse (Default: 1).

reps

Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).

controlledCT

[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type.

use_intersect

When species1 and species2 are both different from output_species, this argument will determine whether to use the intersect (TRUE) or union (FALSE) of all genes from species1 and species2.

verbose

Print messages.

Value

A list containing three data frames:

p_controlled The probability that functional_genes are enriched in disease_genes while controlling for the level of specificity in controlledCT
z_controlled The z-score that functional_genes are enriched in disease_genes while controlling for the level of specificity in controlledCT
p_uncontrolled The probability that functional_genes are enriched in disease_genes WITHOUT controlling for the level of specificity in controlledCT
z_uncontrolled The z-score that functional_genes are enriched in disease_genes WITHOUT controlling for the level of specificity in controlledCT
reps=reps
controlledCT
actualOverlap=actual The number of genes that overlap between functional and disease gene sets

Examples

# See the vignette for more detailed explanations
# Gene set enrichment analysis controlling for cell type expression
# set seed for bootstrap reproducibility
set.seed(12345678)
## load merged dataset from vignette
ctd <- ewceData::ctd()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
schiz_genes <- ewceData::schiz_genes()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
hpsd_genes <- ewceData::hpsd_genes()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3

res_hpsd_schiz <- EWCE::controlled_geneset_enrichment(
    disease_genes = schiz_genes,
    functional_genes = hpsd_genes,
    sct_data = ctd,
    annotLevel = 1,
    reps = reps,
    controlledCT = "pyramidal CA1"
)
#> Warning: genelistSpecies not provided. Setting to 'human' by default.
#> Warning: sctSpecies not provided. Setting to 'mouse' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Warning: genelistSpecies not provided. Setting to 'human' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Standardising CellTypeDataset
#> Found 5 matrix types across 2 CTD levels.
#> Processing level: 1
#> Processing level: 2
#> Generating controlled bootstrap gene sets.