Takes a gene list and a single cell type transcriptome dataset and generates plots which show how the expression of the genes in the list compares to those in randomly generated gene lists.

generate_bootstrap_plots_for_transcriptome(
  sct_data,
  tt,
  bg = NULL,
  thresh = 250,
  annotLevel = 1,
  reps = 100,
  full_results = NA,
  listFileName = "",
  showGNameThresh = 25,
  ttSpecies = NULL,
  sctSpecies = NULL,
  output_species = NULL,
  sortBy = "t",
  sig_only = TRUE,
  sig_col = "q",
  sig_thresh = 0.05,
  celltype_col = "CellType",
  plot_types = c("bootstrap", "bootstrap_distributions", "log_bootstrap_distributions"),
  save_dir = file.path(tempdir(), "BootstrapPlots"),
  method = "homologene",
  verbose = TRUE
)

Arguments

sct_data

List generated using generate_celltype_data.

tt

Differential expression table. Can be output of topTable function. Minimum requirement is that one column stores a metric of increased/decreased expression (i.e. log fold change, t-statistic for differential expression etc) and another contains gene symbols.

bg

List of gene symbols containing the background gene list (including hit genes). If bg=NULL, an appropriate gene background will be created automatically.

thresh

The number of up- and down- regulated genes to be included in each analysis (Default: 250).

annotLevel

An integer indicating which level of sct_data to analyse (Default: 1).

reps

Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).

full_results

The full output of ewce_expression_data for the same gene list.

listFileName

String used as the root for files saved using this function.

showGNameThresh

Integer. If a gene has over X percent of it's expression proportion in a cell type, then list the gene name.

ttSpecies

The species the differential expression table was generated from.

sctSpecies

Species that sct_data is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.

output_species

Species to convert sct_data and hits to (Default: "human"). See list_species for all available species.

sortBy

Column name of metric in tt which should be used to sort up- from down- regulated genes (Default: "t").

sig_only

Should plots only be generated for cells which have significant changes?

sig_col

Column name in tt that contains the significance values.

sig_thresh

Threshold by which to filter tt by sig_col.

celltype_col

Column within tt that contains celltype names.

plot_types

Plot types to generate.

save_dir

Directory where the BootstrapPlots folder should be saved, default is a temp directory.

method

R package to use for gene mapping:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

verbose

Print messages.

Value

Saves a set of PDF files containing graphs. Then returns a nested list with each plot and the path where it was saved to. Files start with one of the following:

  • qqplot_noText: sorts the gene list according to how enriched it is in the relevant cell type. Plots the value in the target list against the mean value in the bootstrapped lists.

  • qqplot_wtGSym: as above but labels the gene symbols for the highest expressed genes.

  • bootDists: rather than just showing the mean of the bootstrapped lists, a boxplot shows the distribution of values

  • bootDists_LOG: shows the bootstrapped distributions with the y-axis shown on a log scale

Examples

## Load the single cell data
ctd <- ewceData::ctd()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache

## Set the parameters for the analysis
## Use 3 bootstrap lists for speed, for publishable analysis use >10,000
reps <- 3
annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons)
## Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5

## Load the top table
tt_alzh <- ewceData::tt_alzh()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache

## See ?example_transcriptome_results for full code to produce tt_results
tt_results <- EWCE::example_transcriptome_results()
#> Loading precomputed example transcriptome results.

## Bootstrap significance test,
## no control for transcript length or GC content
savePath <- EWCE::generate_bootstrap_plots_for_transcriptome(
    sct_data = ctd,
    tt = tt_alzh,
    thresh = thresh,
    annotLevel = 1,
    full_results = tt_results,
    listFileName = "examples",
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse", 
    # Only do one plot type for demo purposes
    plot_types = "bootstrap" 
) 
#> Warning: genelistSpecies not provided. Setting to 'human' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Warning: sctSpecies_origin not provided. Setting to 'mouse' by default.
#> Aligning celltype names with standardise_ctd format.
#> Preparing gene_df.
#> character format detected.
#> Converting to data.frame
#> Extracting genes from input_gene.
#> 15,259 genes extracted.
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 13,416 genes extracted.
#> Extracting genes from ortholog_gene.
#> 13,416 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 46 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 56 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Returning gene_map as dictionary
#> 
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#>    2,016 / 15,259 (13%)
#> Total genes remaining after convert_orthologs :
#>    13,243 / 15,259 (87%)
#> Generating gene background for mouse x human ==> human
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Gene table with 21,207 rows retrieved.
#> Returning all 21,207 genes from mouse.
#> --
#> --
#> Preparing gene_df.
#> data.frame format detected.
#> Extracting genes from Gene.Symbol.
#> 21,207 genes extracted.
#> Converting mouse ==> human orthologs using: homologene
#> Retrieving all organisms available in homologene.
#> Mapping species name: mouse
#> Common name mapping found for mouse
#> 1 organism identified from search: 10090
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Checking for genes without orthologs in human.
#> Extracting genes from input_gene.
#> 17,355 genes extracted.
#> Extracting genes from ortholog_gene.
#> 17,355 genes extracted.
#> Checking for genes without 1:1 orthologs.
#> Dropping 131 genes that have multiple input_gene per ortholog_gene (many:1).
#> Dropping 498 genes that have multiple ortholog_gene per input_gene (1:many).
#> Filtering gene_df with gene_map
#> Adding input_gene col to gene_df.
#> Adding ortholog_gene col to gene_df.
#> 
#> =========== REPORT SUMMARY ===========
#> Total genes dropped after convert_orthologs :
#>    4,725 / 21,207 (22%)
#> Total genes remaining after convert_orthologs :
#>    16,482 / 21,207 (78%)
#> --
#> 
#> =========== REPORT SUMMARY ===========
#> 16,482 / 21,207 (77.72%) target_species genes remain after ortholog conversion.
#> 16,482 / 19,129 (86.16%) reference_species genes remain after ortholog conversion.
#> Gathering ortholog reports.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> --
#> 
#> =========== REPORT SUMMARY ===========
#> 19,129 / 19,129 (100%) target_species genes remain after ortholog conversion.
#> 19,129 / 19,129 (100%) reference_species genes remain after ortholog conversion.
#> 16,482 intersect background genes used.
#> Retrieving all genes using: homologene.
#> Retrieving all organisms available in homologene.
#> Mapping species name: human
#> Common name mapping found for human
#> 1 organism identified from search: 9606
#> Gene table with 19,129 rows retrieved.
#> Returning all 19,129 genes from human.
#> Returning 19,129 unique genes from entire human genome.
#> Using intersect between background gene lists: 16,482 genes.
#> Standardising sct_data.
#> Using 1st column of tt as gene column: HGNC.symbol
#> Generating exp data for bootstrap genes.
#> Converting data for bootstrap tests to sparse matrices.
#> microglia : Saving bootstrap plot --> /tmp/RtmpLzmMED/BootstrapPlots/qqplot_noText_thresh5__dirUp___examples____microglia.pdf
#> microglia : Saving bootstrap plot --> /tmp/RtmpLzmMED/BootstrapPlots/qqplot_wtGSym_thresh5__dirUp___examples____microglia.pdf
#> microglia : Saving bootstrap plot --> /tmp/RtmpLzmMED/BootstrapPlots/qqplot_wtGSymBIG_thresh5__dirUp___examples____microglia.pdf
#> $plot1

#> 
#> $plot2
#> 
#> endothelial_mural : Saving bootstrap plot --> /tmp/RtmpLzmMED/BootstrapPlots/qqplot_noText_thresh5__dirUp___examples____endothelial_mural.pdf
#> endothelial_mural : Saving bootstrap plot --> /tmp/RtmpLzmMED/BootstrapPlots/qqplot_wtGSym_thresh5__dirUp___examples____endothelial_mural.pdf
#> endothelial_mural : Saving bootstrap plot --> /tmp/RtmpLzmMED/BootstrapPlots/qqplot_wtGSymBIG_thresh5__dirUp___examples____endothelial_mural.pdf

#> $plot1

#> 
#> $plot2
#> 
#> Generating exp data for bootstrap genes.
#> Converting data for bootstrap tests to sparse matrices.
#> pyramidal_SS : Saving bootstrap plot --> /tmp/RtmpLzmMED/BootstrapPlots/qqplot_noText_thresh5__dirDown___examples____pyramidal_SS.pdf
#> pyramidal_SS : Saving bootstrap plot --> /tmp/RtmpLzmMED/BootstrapPlots/qqplot_wtGSym_thresh5__dirDown___examples____pyramidal_SS.pdf
#> pyramidal_SS : Saving bootstrap plot --> /tmp/RtmpLzmMED/BootstrapPlots/qqplot_wtGSymBIG_thresh5__dirDown___examples____pyramidal_SS.pdf

#> $plot1

#> 
#> $plot2

#>