Also checks whether any gene names contain "Sep", "Mar" or "Feb". These should be checked for any suggestion that excel has corrupted the gene names.

fix_bad_mgi_symbols(
  exp,
  mrk_file_path = NULL,
  printAllBadSymbols = FALSE,
  as_sparse = TRUE,
  verbose = TRUE,
  localHub = FALSE
)

Arguments

exp

An expression matrix where the rows are MGI symbols, or a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object.

mrk_file_path

Path to the MRK_List2 file which can be downloaded from www.informatics.jax.org/downloads/reports/index.html

printAllBadSymbols

Output to console all the bad gene symbols

as_sparse

Convert exp to sparse matrix.

verbose

Print messages.

localHub

If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.

Value

Returns the expression matrix with the rownames corrected and rows representing the same gene merged. If no corrections are necessary, input expression matrix is returned. If a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object was inputted this will be returned with the corrected expression matrix under counts.

Examples

# Load the single cell data
cortex_mrna <- ewceData::cortex_mrna()
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
# take a subset for speed
cortex_mrna$exp <- cortex_mrna$exp[1:50, 1:5]
cortex_mrna$exp <- fix_bad_mgi_symbols(cortex_mrna$exp)
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
#> 5 rows do not have proper MGI symbols
#> 2310042E22Rik, BC005764, C130030K03Rik, Stmn1-rs1, Gm9846
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
#> 0 poorly annotated genes are replicates of existing genes. These are: 
#> 
#> Converting to sparse matrix.
#> 3 rows should have been corrected by checking synonyms.
#> 2 rows STILL do not have proper MGI symbols.