get_summed_proportions Given the target gene set, randomly sample gene lists of equal length, obtain the specificity of these and then obtain the mean specificity in each sampled list (and the target list).

get_summed_proportions(
  hits,
  sct_data,
  annotLevel,
  reps,
  no_cores = 1,
  geneSizeControl,
  controlledCT = NULL,
  control_network = NULL,
  store_gene_data = TRUE,
  verbose = TRUE
)

Arguments

hits

list of gene names. The target gene set.

sct_data

List generated using generate_celltype_data.

annotLevel

An integer indicating which level of sct_data to analyse (Default: 1).

reps

Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).

no_cores

Number of cores to parallelise bootstrapping reps over.

geneSizeControl

Whether you want to control for GC content and transcript length. Recommended if the gene list originates from genetic studies (Default: FALSE). If set to TRUE, then hits must be from humans.

controlledCT

[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type.

control_network

If geneSizeControl=TRUE, then must provide the control network.

store_gene_data

Store sampled gene data for every bootstrap iteration. When the number of bootstrap reps is very high (>=100k) and/or the number of genes in hits is very high, you may want to set store_gene_data=FALSE to avoid using excessive amounts of CPU memory.

verbose

Print messages.

Value

A list containing three elements:

  • hit.cells: vector containing the summed proportion of expression in each cell type for the target list.

  • gene_data: data.table showing the number of time each gene appeared in the bootstrap sample.

  • bootstrap_data: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists

  • controlledCT: the controlled cell type (if applicable)

Details

See bootstrap_enrichment_test for examples.