简体   繁体   中英

Finding matching values in two vectors of different lengths in R

I have two vectors with species names following two different methods. Some names are the same, others are different and both are sorted in different ways. An example: list 1: c(Homo sapiens sapiens, Homo sapiens neanderthalensis, Homo erectus,...,n) List 2: c(Homo erectus, Homo sapiens, Homo neanderthalensis,...,n+1)

I write n and n+1 to denote that these lists have different lengths.

I would like to create a new list that consists out of two values: in the case that there is a match between the two vectors (eg Homo erectus) I would like to have the name of list 2 at the location the name has in List 1, or in case there is a mismatch a "0" at the location in List 1. So in this case this new list would be newlist: c(0,0, Homo erectus,...)

For this I have written the following code, but it does not work.

data<-read.table("species.txt",sep="\t",header=TRUE)
list1<-as.vector(data$Species1)
list2<-as.vector(data$Species2)
newlist<-as.character(rep(0,length(list1)))

for (i in 1:length(list1)){
for (j in 1:length(list2)){
if(list1[i] == list2[j]){newlist[i]<- list2[j]}else {newlist[i]= 0}
}
}

I hope this is clear.

Thanks for any help!

Take this reproducible example:

set.seed(1)
list1 <- letters[1:10]
list1names
list2 <- letters[sample(1:10, 10)]

You can avoid a loop using ifelse :

newlist <- ifelse(list1==list2, list2, 0)

The issue is that you did not declare newname , did you mean newlist ?

If you want to use a loop you can use only one loop and not 2 because length(list1) = length(list2) :

for (i in 1:length(list1)){
    if(list1[i] == list2[i]){newlist[i]<- list2[i]}else {newlist[i]= 0}
}

In general if you want to match elements in vectors you can use match like this:

> list1
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
> list2
 [1] "c" "d" "e" "g" "b" "h" "i" "f" "j" "a"
> match(list1, list2)
 [1] 10  5  1  2  3  8  4  6  7  9

As you can see match gets the indexes of the elements in list2 which are equal to the elements in list1 . This is useful in case you have another table data2 , and you would like to fetch the column in data2 for corresponding elements from data$list1 in data2$list3 , you would use:

data <- data.frame(list1, list2)
list3 <- list2
columntoget <- 1:length(list2)
data2 <- data.frame(list3, columntoget)
data$mynewcolumn <- data2$columntoget[match(data$list1, data2$list3)]
> data$mynewcolumn
 [1] 10  5  1  2  3  8  4  6  7  9

I'm not completely certain that I understand what you're trying to achieve, but I think this does what you're after.

list1 <- c("Homo sapiens sapiens","Homo sapiens neanderthalensis","Homo erectus")
list2 <- c("Homo erectus","Homo sapiens","Homo neanderthalensis")

sapply(list1, function(x) { ifelse(x %in% list2, list2[which(list1 == x)], 0) } )

The inner for loop uses newname[i] where it should be newlist[i] . Using your code, you overwrite the newlist[i] entries j times with either 0 or a species name. This is probably not what you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM