takes gene expression data and
cell type annotations and creates CellTypeData (CTD) files which
contain matrices of mean expression and specificity per cell type.
no_cores = 1,
savePath = tempdir(),
file_prefix = "ctd",
as_sparse = TRUE,
as_DelayedArray = FALSE,
normSpec = FALSE,
convert_orths = FALSE,
input_species = "mouse",
output_species = "human",
non121_strategy = "drop_both_species",
method = "homologene",
force_new_file = TRUE,
specificity_quantiles = TRUE,
numberOfBins = 40,
dendrograms = TRUE,
return_ctd = FALSE,
verbose = TRUE,
Numerical matrix with row for each gene and column for each cell. Row names are gene symbols. Column names are cell IDs which can be cross referenced against the annot data frame.
List with arrays of strings containing the cell type
names associated with each column in exp
A human readable name for referring to the dataset being used.
Number of cores that should be used to speedup the
NOTE: Use no_cores=1
when using this package in windows system.
Directory where the CTD file should be saved.
Prefix to add to saved CTD file name.
Convert exp
to a sparse Matrix
Convert exp
to DelayedArray
Boolean indicating whether specificity data should be transformed to a normal distribution by cell type, giving equivalent scores across all cell types.
If input_species!=output_species
, will drop genes without
1:1 output_species
orthologs and then convert exp
gene names
to those of output_species
The species that the exp
dataset comes from.
See list_species for all available species.
Species to convert exp
(Default: "human").
See list_species for all available species.
How to handle genes that don't have
1:1 mappings between input_species
Options include:
"drop_both_species" or "dbs" or 1
Drop genes that have duplicate
mappings in either the input_species
or output_species
"drop_input_species" or "dis" or 2
Only drop genes that have duplicate
mappings in the input_species
"drop_output_species" or "dos" or 3
Only drop genes that have duplicate
mappings in the output_species
"keep_both_species" or "kbs" or 4
Keep all genes regardless of whether
they have duplicate mappings in either species.
"keep_popular" or "kp" or 5
Return only the most "popular" interspecies ortholog mappings.
This procedure tends to yield a greater number of returned genes
but at the cost of many of them not being true biological 1:1 orthologs.
"sum","mean","median","min" or "max"
When gene_df
is a matrix and gene_output="rownames"
these options will aggregate many-to-one gene mappings
after dropping any duplicate genes in the output_species
R package to use for gene mapping:
: Slower but more species and genes.
: Faster but fewer species and genes.
: Faster but fewer species and genes.
Also gives consensus scores for each gene mapping based on a
several different data sources.
If a file of the same name as the one being created already exists, overwrite it.
Compute specificity quantiles.
Recommended to set to TRUE
Number of quantile 'bins' to use (40 is recommended).
Add dendrogram plots
Return the CTD object in a list along with the file name, instead of just the file name.
Print messages.
Arguments passed on to orthogene::convert_orthologs
Data object containing the genes
(see gene_input
for options on how
the genes can be stored within the object).
Can be one of the following formats:
A sparse or dense matrix.
A data.frame
. or tibble
codelist :
A list
or character vector
Genes, transcripts, proteins, SNPs, or genomic ranges
can be provided in any format
(HGNC, Ensembl, RefSeq, UniProt, etc.) and will be
automatically converted to gene symbols unless
specified otherwise with the ...
Note: If you set method="homologene"
, you
must either supply genes in gene symbol format (e.g. "Sox2")
OR set standardise_genes=TRUE
Which aspect of gene_df
get gene names from:
From row names of data.frame/matrix.
From column names of data.frame/matrix.
<column name>
From a column in gene_df
e.g. "gene_names"
How to return genes.
Options include:
As row names of gene_df
As column names of gene_df
As new columns "input_gene", "ortholog_gene"
(and "input_gene_standard" if standardise_genes=TRUE
in gene_df
As a dictionary (named list) where the names
are input_gene and the values are ortholog_gene.
As a reversed dictionary (named list)
where the names are ortholog_gene and the values are input_gene.
, a new column "input_gene_standard"
will be added to gene_df
containing standardised HGNC symbols
identified by gorth.
Drop genes that don't have an ortholog
in the output_species
Aggregation function passed to
Set to NULL
to skip aggregation step (default).
Maximum number of ortholog names per gene to show.
Passed to gorth.
Only used when method="gprofiler"
Sort gene_df
rows alphanumerically.
A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows:
When a data.frame containing the
gene key:value columns
(specified by input_col
and output_col
, respectively)
is provided, this will be used to perform aggregation/expansion.
and input_species!=output_species
A gene_map
is automatically generated by
map_orthologs to perform inter-species
gene aggregation/expansion.
and input_species==output_species
A gene_map
is automatically generated by
map_genes to perform within-species
gene gene symbol standardization and aggregation/expansion.
Column name within gene_map
with gene names matching
the row names of X
Column name within gene_map
with gene names
that you wish you map the row names of X
File names for the saved CellTypeData (CTD) files.
# Load the single cell data
cortex_mrna <- ewceData::cortex_mrna()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
# Use only a subset to keep the example quick
expData <- cortex_mrna$exp[1:100, ]
l1 <- cortex_mrna$annot$level1class
l2 <- cortex_mrna$annot$level2class
annotLevels <- list(l1 = l1, l2 = l2)
fNames_ALLCELLS <- EWCE::generate_celltype_data(
exp = expData,
annotLevels = annotLevels,
groupName = "allKImouse"
#> 1 core(s) assigned as workers (3 reserved).
#> Converting to sparse matrix.
#> + Calculating normalized mean expression.
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> + Calculating normalized specificity.
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> Converting to sparse matrix.
#> Loading required namespace: ggdendro
#> + Saving results ==> /tmp/RtmpvgMS6z/ctd_allKImouse.rda