R/ewce_expression_data.r
ewce_expression_data.Rd
ewce_expression_data
takes a differential gene expression (DGE)
results table and determines the probability of cell type enrichment
in the up- and down- regulated genes.
ewce_expression_data(
sct_data,
annotLevel = 1,
tt,
sortBy = "t",
thresh = 250,
reps = 100,
ttSpecies = NULL,
sctSpecies = NULL,
output_species = NULL,
bg = NULL,
method = "homologene",
verbose = TRUE,
localHub = FALSE
)
List generated using generate_celltype_data.
An integer indicating which level of sct_data
to
analyse (Default: 1).
Differential expression table. Can be output of topTable function. Minimum requirement is that one column stores a metric of increased/decreased expression (i.e. log fold change, t-statistic for differential expression etc) and another contains gene symbols.
Column name of metric in tt
which should be used to sort up- from down- regulated genes (Default: "t").
The number of up- and down- regulated genes to be included in each analysis (Default: 250).
Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
The species the differential expression table was generated from.
Species that sct_data
is currently formatted as
(no longer limited to just "mouse" and "human").
See list_species for all available species.
Species to convert sct_data
and hits
to
(Default: "human").
See list_species for all available species.
List of gene symbols containing the background gene list
(including hit genes). If bg=NULL
,
an appropriate gene background will be created automatically.
R package to use for gene mapping:
"gprofiler"
: Slower but more species and genes.
"homologene"
: Faster but fewer species and genes.
"babelgene"
: Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
Print messages.
If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.
A list containing five data frames:
results
: dataframe in which each row gives the statistics
(p-value, fold change and number of standard deviations from the mean)
associated with the enrichment of the stated cell type in the gene list.
An additional column *Direction* stores whether it the result is from the
up or downregulated set.
hit.cells.up
: vector containing the summed proportion of
expression in each cell type for the target list.
hit.cells.down
: vector containing the summed proportion of
expression in each cell type for the target list.
bootstrap_data.up
: matrix in which each row represents the
summed proportion of expression in each cell type for one of the random
lists.
bootstrap_data.down
: matrix in which each row represents the
summed proportion of expression in each cell type for one of the random
lists.
# Load the single cell data
ctd <- ewceData::ctd()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
# Set the parameters for the analysis
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
# Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5
annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons)
# Load the top table
tt_alzh <- ewceData::tt_alzh()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
tt_results <- EWCE::ewce_expression_data(
sct_data = ctd,
tt = tt_alzh,
annotLevel = 1,
thresh = thresh,
reps = reps,
ttSpecies = "human",
sctSpecies = "mouse"
)
#> Warning: genelistSpecies not provided. Setting to 'human' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Preparing gene_df.
#> character format detected.
#> Converting to data.frame
#> Extracting genes from input_gene.
#> 15,259 genes extracted.
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 13,416 genes extracted.
#> Extracting genes from ortholog_gene.
#> 13,416 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 46 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 56 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Returning gene_map as dictionary
#>
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#> 2,016 / 15,259 (13%)
#> Total genes remaining after convert_orthologs :
#> 13,243 / 15,259 (87%)
#> Generating gene background for mouse x human ==> human
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Gene table with 21,207 rows retrieved.
#> Returning all 21,207 genes from mouse.
#> --
#> --
#> Preparing gene_df.
#> data.frame format detected.
#> Extracting genes from Gene.Symbol.
#> 21,207 genes extracted.
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 17,355 genes extracted.
#> Extracting genes from ortholog_gene.
#> 17,355 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 131 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 498 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#>
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#> 4,725 / 21,207 (22%)
#> Total genes remaining after convert_orthologs :
#> 16,482 / 21,207 (78%)
#> --
#>
#> =========== REPORT SUMMARY ===========
#> 16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion.
#> 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion.
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> --
#>
#> =========== REPORT SUMMARY ===========
#> 19,129 / 19,129 (100%) target_species genes remain after ortholog conversion.
#> 19,129 / 19,129 (100%) reference_species genes remain after ortholog conversion.
#> 16,482 intersect background genes used.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Returning 19,129 unique genes from entire human genome.
#> Using intersect between background gene lists: 16,482 genes.
#> Standardising sct_data.
#> Using 1st column of tt as gene column: HGNC.symbol
#> 1 core(s) assigned as workers (3 reserved).
#> Standardising CellTypeDataset
#> Checking gene list inputs.
#> Running without gene size control.
#> 6 hit gene(s) remain after filtering.
#> Computing gene scores.
#> Using previously sampled genes.
#> Computing gene counts.
#> Testing for enrichment in 7 cell types...
#> Sorting results by p-value.
#> Computing BH-corrected q-values.
#> 2 significant cell type enrichment results @ q<0.05 :
#> CellType annotLevel p fold_change sd_from_mean q
#> 1 microglia 1 0 2.138646 2.895915 0
#> 2 endothelial_mural 1 0 1.596961 1.635880 0
#> 1 core(s) assigned as workers (3 reserved).
#> Standardising CellTypeDataset
#> Checking gene list inputs.
#> Running without gene size control.
#> 5 hit gene(s) remain after filtering.
#> Computing gene scores.
#> Using previously sampled genes.
#> Computing gene counts.
#> Testing for enrichment in 7 cell types...
#> Sorting results by p-value.
#> Computing BH-corrected q-values.
#> 2 significant cell type enrichment results @ q<0.05 :
#> CellType annotLevel p fold_change sd_from_mean q
#> 1 pyramidal_CA1 1 0 1.618663 3.49606 0
#> 2 interneurons 1 0 1.827534 1.64491 0