R/controlled_geneset_enrichment.r
controlled_geneset_enrichment.Rd
controlled_geneset_enrichment
tests whether a functional gene set is
still enriched in a disease gene set after controlling for the
disease gene set's enrichment in a particular cell type (the 'controlledCT')
controlled_geneset_enrichment(
disease_genes,
functional_genes,
bg = NULL,
sct_data,
sctSpecies = NULL,
output_species = "human",
disease_genes_species = NULL,
functional_genes_species = NULL,
method = "homologene",
annotLevel,
reps = 100,
controlledCT,
use_intersect = FALSE,
verbose = TRUE
)
Array of gene symbols containing the disease gene list. Does not have to be disease genes. Must be from same species as the single cell transcriptome dataset.
Array of gene symbols containing the functional gene list. The enrichment of this gene set within the disease_genes is tested. Must be from same species as the single cell transcriptome dataset.
List of gene symbols containing the background gene list
(including hit genes). If bg=NULL
,
an appropriate gene background will be created automatically.
List generated using generate_celltype_data.
Species that sct_data
is currently formatted as
(no longer limited to just "mouse" and "human").
See list_species for all available species.
Species to convert sct_data
and hits
to
(Default: "human").
See list_species for all available species.
Species of the
disease_genes
gene set.
Species of the
functional_genes
gene set.
R package to use for gene mapping:
"gprofiler"
: Slower but more species and genes.
"homologene"
: Faster but fewer species and genes.
"babelgene"
: Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
An integer indicating which level of sct_data
to
analyse (Default: 1).
Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type.
When species1
and species2
are both
different from output_species
, this argument will determine whether
to use the intersect (TRUE
) or union (FALSE
) of all genes
from species1
and species2
.
Print messages.
A list containing three data frames:
p_controlled
The probability that functional_genes are
enriched in disease_genes while controlling for the level of specificity
in controlledCT
z_controlled
The z-score that functional_genes are enriched
in disease_genes while controlling for the level of specificity in
controlledCT
p_uncontrolled
The probability that functional_genes are
enriched in disease_genes WITHOUT controlling for the level of
specificity in controlledCT
z_uncontrolled
The z-score that functional_genes are enriched
in disease_genes WITHOUT controlling for the level of specificity in
controlledCT
reps=reps
controlledCT
actualOverlap=actual
The number of genes that overlap between
functional and disease gene sets
# See the vignette for more detailed explanations
# Gene set enrichment analysis controlling for cell type expression
# set seed for bootstrap reproducibility
set.seed(12345678)
## load merged dataset from vignette
ctd <- ewceData::ctd()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
schiz_genes <- ewceData::schiz_genes()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
hpsd_genes <- ewceData::hpsd_genes()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
res_hpsd_schiz <- EWCE::controlled_geneset_enrichment(
disease_genes = schiz_genes,
functional_genes = hpsd_genes,
sct_data = ctd,
annotLevel = 1,
reps = reps,
controlledCT = "pyramidal CA1"
)
#> Warning: genelistSpecies not provided. Setting to 'human' by default.
#> Warning: sctSpecies not provided. Setting to 'mouse' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Warning: genelistSpecies not provided. Setting to 'human' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Standardising CellTypeDataset
#> Found 5 matrix types across 2 CTD levels.
#> Processing level: 1
#> Processing level: 2
#> Generating controlled bootstrap gene sets.