merged_ewce combines enrichment results from multiple studies targetting the same scientific problem

merged_ewce(results, reps = 100)

Arguments

results

a list of EWCE results generated using add_res_to_merging_list.

reps

Number of random gene lists to generate (Default=100 but should be >=10,000 for publication-quality results).

Value

dataframe in which each row gives the statistics (p-value, fold change and number of standard deviations from the mean) associated with the enrichment of the stated cell type in the gene list.

Examples

# Load the single cell data
ctd <- ewceData::ctd()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache

# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
# Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5

# Load the data
tt_alzh_BA36 <- ewceData::tt_alzh_BA36()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
tt_alzh_BA44 <- ewceData::tt_alzh_BA44()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache

# Run EWCE analysis
tt_results_36 <- EWCE::ewce_expression_data(
    sct_data = ctd,
    tt = tt_alzh_BA36,
    thresh = thresh,
    annotLevel = 1,
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse"
)
#> Warning: genelistSpecies not provided. Setting to 'human' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Preparing gene_df.
#> character format detected.
#> Converting to data.frame
#> Extracting genes from input_gene.
#> 15,259 genes extracted.
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 13,416 genes extracted.
#> Extracting genes from ortholog_gene.
#> 13,416 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 46 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 56 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Returning gene_map as dictionary
#> 
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#>    2,016 / 15,259 (13%)
#> Total genes remaining after convert_orthologs :
#>    13,243 / 15,259 (87%)
#> Generating gene background for mouse x human ==> human
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Gene table with 21,207 rows retrieved.
#> Returning all 21,207 genes from mouse.
#> --
#> --
#> Preparing gene_df.
#> data.frame format detected.
#> Extracting genes from Gene.Symbol.
#> 21,207 genes extracted.
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 17,355 genes extracted.
#> Extracting genes from ortholog_gene.
#> 17,355 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 131 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 498 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#> 
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#>    4,725 / 21,207 (22%)
#> Total genes remaining after convert_orthologs :
#>    16,482 / 21,207 (78%)
#> --
#> 
#> =========== REPORT SUMMARY ===========
#> 16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion.
#> 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion.
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> --
#> 
#> =========== REPORT SUMMARY ===========
#> 19,129 / 19,129 (100%) target_species genes remain after ortholog conversion.
#> 19,129 / 19,129 (100%) reference_species genes remain after ortholog conversion.
#> 16,482 intersect background genes used.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Returning 19,129 unique genes from entire human genome.
#> Using intersect between background gene lists: 16,482 genes.
#> Standardising sct_data.
#> Using 1st column of tt as gene column: HGNC.symbol
#> 1 core(s) assigned as workers (3 reserved).
#> Standardising CellTypeDataset
#> Checking gene list inputs.
#> Running without gene size control.
#> 6 hit gene(s) remain after filtering.
#> Computing gene scores.
#> Using previously sampled genes.
#> Computing gene counts.
#> Testing for enrichment in 7 cell types...
#> Sorting results by p-value.
#> Computing BH-corrected q-values.
#> 2 significant cell type enrichment results @ q<0.05 : 
#>        CellType annotLevel p fold_change sd_from_mean q
#> 1 pyramidal_CA1          1 0    1.411635     2.789677 0
#> 2  pyramidal_SS          1 0    1.192536     1.785915 0
#> 1 core(s) assigned as workers (3 reserved).
#> Standardising CellTypeDataset
#> Checking gene list inputs.
#> Running without gene size control.
#> 5 hit gene(s) remain after filtering.
#> Computing gene scores.
#> Using previously sampled genes.
#> Computing gene counts.
#> Testing for enrichment in 7 cell types...
#> Sorting results by p-value.
#> Computing BH-corrected q-values.
#> 3 significant cell type enrichment results @ q<0.05 : 
#>               CellType annotLevel p fold_change sd_from_mean q
#> 1            microglia          1 0    2.841819     6.454695 0
#> 2         pyramidal_SS          1 0    1.455683     4.158702 0
#> 3 astrocytes_ependymal          1 0    1.530297     1.227513 0
tt_results_44 <- EWCE::ewce_expression_data(
    sct_data = ctd,
    tt = tt_alzh_BA44,
    thresh = thresh,
    annotLevel = 1,
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse"
)
#> Warning: genelistSpecies not provided. Setting to 'human' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Preparing gene_df.
#> character format detected.
#> Converting to data.frame
#> Extracting genes from input_gene.
#> 15,259 genes extracted.
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 13,416 genes extracted.
#> Extracting genes from ortholog_gene.
#> 13,416 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 46 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 56 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Returning gene_map as dictionary
#> 
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#>    2,016 / 15,259 (13%)
#> Total genes remaining after convert_orthologs :
#>    13,243 / 15,259 (87%)
#> Generating gene background for mouse x human ==> human
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Gene table with 21,207 rows retrieved.
#> Returning all 21,207 genes from mouse.
#> --
#> --
#> Preparing gene_df.
#> data.frame format detected.
#> Extracting genes from Gene.Symbol.
#> 21,207 genes extracted.
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 17,355 genes extracted.
#> Extracting genes from ortholog_gene.
#> 17,355 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 131 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 498 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#> 
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#>    4,725 / 21,207 (22%)
#> Total genes remaining after convert_orthologs :
#>    16,482 / 21,207 (78%)
#> --
#> 
#> =========== REPORT SUMMARY ===========
#> 16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion.
#> 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion.
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> --
#> 
#> =========== REPORT SUMMARY ===========
#> 19,129 / 19,129 (100%) target_species genes remain after ortholog conversion.
#> 19,129 / 19,129 (100%) reference_species genes remain after ortholog conversion.
#> 16,482 intersect background genes used.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Returning 19,129 unique genes from entire human genome.
#> Using intersect between background gene lists: 16,482 genes.
#> Standardising sct_data.
#> Using 1st column of tt as gene column: HGNC.symbol
#> 1 core(s) assigned as workers (3 reserved).
#> Standardising CellTypeDataset
#> Checking gene list inputs.
#> Running without gene size control.
#> 6 hit gene(s) remain after filtering.
#> Computing gene scores.
#> Using previously sampled genes.
#> Computing gene counts.
#> Testing for enrichment in 7 cell types...
#> Sorting results by p-value.
#> Computing BH-corrected q-values.
#> 3 significant cell type enrichment results @ q<0.05 : 
#>               CellType annotLevel p fold_change sd_from_mean q
#> 1    endothelial_mural          1 0    2.002147     5.407831 0
#> 2     oligodendrocytes          1 0    2.010357     4.419611 0
#> 3 astrocytes_ependymal          1 0    1.156367     3.094029 0
#> 1 core(s) assigned as workers (3 reserved).
#> Standardising CellTypeDataset
#> Checking gene list inputs.
#> Running without gene size control.
#> 5 hit gene(s) remain after filtering.
#> Computing gene scores.
#> Using previously sampled genes.
#> Computing gene counts.
#> Testing for enrichment in 7 cell types...
#> Sorting results by p-value.
#> Computing BH-corrected q-values.
#> 2 significant cell type enrichment results @ q<0.05 : 
#>               CellType annotLevel p fold_change sd_from_mean q
#> 1 astrocytes_ependymal          1 0    1.803066     2.350458 0
#> 2        pyramidal_CA1          1 0    1.493630     1.528110 0

# Fill a list with the results
results <- EWCE::add_res_to_merging_list(tt_results_36)
results <- EWCE::add_res_to_merging_list(tt_results_44, results)

# Perform the merged analysis
# For publication reps should be higher
merged_res <- EWCE::merged_ewce(
    results = results,
    reps = 2
)
print(merged_res)
#>                                   CellType       p        fc sd_from_mean
#> astrocytes_ependymal  astrocytes_ependymal 0.66130 0.8866746   -0.5885042
#> endothelial_mural        endothelial_mural 0.33110 1.1171572    0.8496895
#> interneurons                  interneurons 0.78115 0.7658062   -0.9465746
#> microglia                        microglia 0.66555 0.8459999   -0.6892282
#> oligodendrocytes          oligodendrocytes 0.00000 1.7426915    2.4818484
#> pyramidal_CA1                pyramidal_CA1 0.44395 1.0142639    0.2497222
#> pyramidal_SS                  pyramidal_SS 1.00000 0.9174883   -1.4337207
#> astrocytes_ependymal1 astrocytes_ependymal 0.00000 1.6844598    3.1154348
#> endothelial_mural1       endothelial_mural 1.00000 0.5559731   -1.8899465
#> interneurons1                 interneurons 0.77590 0.7063486   -0.8681162
#> microglia1                       microglia 0.00000 1.4252970    1.4699207
#> oligodendrocytes1         oligodendrocytes 0.22070 1.1261607    0.5880097
#> pyramidal_CA11               pyramidal_CA1 0.11095 1.1953337    1.6270440
#> pyramidal_SS1                 pyramidal_SS 0.00000 1.2607602    3.0064648
#>                       Direction
#> astrocytes_ependymal         Up
#> endothelial_mural            Up
#> interneurons                 Up
#> microglia                    Up
#> oligodendrocytes             Up
#> pyramidal_CA1                Up
#> pyramidal_SS                 Up
#> astrocytes_ependymal1      Down
#> endothelial_mural1         Down
#> interneurons1              Down
#> microglia1                 Down
#> oligodendrocytes1          Down
#> pyramidal_CA11             Down
#> pyramidal_SS1              Down