Given an expression matrix, wherein the rows are supposed to be HGNC symbols, find those symbols which are not official HGNC symbols, then correct them if possible. Return the expression matrix with corrected symbols.
fix_bad_hgnc_symbols(
exp,
dropNonHGNC = FALSE,
as_sparse = TRUE,
verbose = TRUE,
localHub = FALSE
)
An expression matrix where the rows are HGNC symbols or a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object.
Boolean. Should symbols not recognised as HGNC symbols be dropped?
Convert exp
to sparse matrix.
Print messages.
If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.
Returns the expression matrix with the rownames corrected and rows representing the same gene merged. If a SingleCellExperiment (SCE) or other Ranged Summarized Experiment (SE) type object was inputted this will be returned with the corrected expression matrix under counts.
# create example expression matrix, could be part of a exp, annot list obj
exp <- matrix(data = runif(70), ncol = 10)
# Add HGNC gene names but add with an error:
# MARCH8 is a HGNC symbol which if opened in excel will convert to Mar-08
rownames(exp) <-
c("MT-TF", "MT-RNR1", "MT-TV", "MT-RNR2", "MT-TL1", "MT-ND1", "Mar-08")
exp <- fix_bad_hgnc_symbols(exp)
#> see ?ewceData and browseVignettes('ewceData') for documentation
#> loading from cache
#> 1 of 7 are not proper HGNC symbols.
#> Possible corruption of gene names by excel: Mar-08
#> Warning: Possible corruption of gene names by excel: Mar-08
#> Maps last updated on: Fri May 17 15:09:37 2024
#> Warning: Human gene symbols should be all upper-case except for the 'orf' in open reading frames. The case of some letters was corrected.
#> Warning: x contains non-approved gene symbols
#> Maps last updated on: Fri May 17 15:09:37 2024
#> Warning: Human gene symbols should be all upper-case except for the 'orf' in open reading frames. The case of some letters was corrected.
#> Warning: x contains non-approved gene symbols
#> 0 of 7 gene symbols corrected.
#> 1 of 7 gene symbols cannot be mapped.
#> Converting to sparse matrix.
# fix_bad_hgnc_symbols warns the user of this possible issue