简体   繁体   中英

How do I replace values in a vector when matching values exist in another vector in R?

I have a vector that looks like this, lets call it gene_list:

"ENSMPUG00000000002" "ENSMPUG00000000003" "ENSMPUG00000000004" 
"ENSMPUG00000000005" "ENSMPUG00000000006" "ENSMPUG00000000007"
....
32057 items.

I have also the following, lets call it t1:

 hgnc_symbol        ensembl_gene_id   
 Length:32057       Length:32057      
 Class :character   Class :character  
 Mode  :character   Mode  :character 

The head of t1 looks like:

hgnc_symbol    ensembl_gene_id
1             ENSMPUG00000000002
2             ENSMPUG00000000003
3             ENSMPUG00000000004
4             ENSMPUG00000000005
5             ENSMPUG00000000006
6      MAP2K3 ENSMPUG00000000007
....

What I want to do is replace the items in the first vector when a match is found in the second column of t1. Note that many cases for the hgnc_symbol column are empty. I only want to replace when the match is found in the second column and a value exists in the first.

So in some r pseudocode, maybe something like

if t1$ensemble_gene_id[i] %in% gene_list 
    gene_list[i] = hgnc_symbol[i].

or

gene_list = gene_list[which(gene_list == t1$ensemble_gene_id)]

I know these aren't correct, just trying to convey what I want to achieve. I know I could accomplish this in a loopy sort of way, but I'm also fairly sure there is a straightforward R-style way to do this in a line or two and am trying to adjust my R style. I appreciate any input. Thanks.

You can use a named vector to conditionally map the old values to new values.

gene_list <- c("ENSMPUG00000000002", "ENSMPUG00000000003", "ENSMPUG00000000004", 
               "ENSMPUG00000000005", "ENSMPUG00000000006", "ENSMPUG00000000007")

t1 <- read.csv(text='hgnc_symbol,ensembl_gene_id
,ENSMPUG00000000002
,ENSMPUG00000000003
,ENSMPUG00000000004
,ENSMPUG00000000005
,ENSMPUG00000000006
MAP2K3,ENSMPUG00000000007', stringsAsFactors = FALSE, na.strings = "")

# Create a named vector 
lookup <- t1$hgnc_symbol[ !is.na(t1$hgnc_symbol) ]             # values = new names
names(lookup) <- t1$ensembl_gene_id[ !is.na(t1$hgnc_symbol) ]  # names  = old names

# Use the named vector as a hash lookup
new_gene_list <- ifelse( is.na(lookup[gene_list]), gene_list, lookup[gene_list])

# Drop the names from the resulting vector
unname(new_gene_list)

Results:

> unname(new_gene_list)
[1] "ENSMPUG00000000002" "ENSMPUG00000000003" "ENSMPUG00000000004" "ENSMPUG00000000005" "ENSMPUG00000000006"
[6] "MAP2K3"    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM