[英]Smartest way to check if an observation in data.frame(x) exists also in data.frame(y) and populate a new column according with the result
Having two dataframes: 有两个数据框:
x <- data.frame(numbers=c('1','2','3','4','5','6','7','8','9'), coincidence="NA")
and 和
y <- data.frame(numbers=c('1','3','10'))
How can I check if the observations in y
(1, 3 and 10) also exist in x
and fill accordingly the column x["coincidence"]
(for example with YES|NO, TRUE|FALSE...). 如何检查y
(1、3和10)中的观测值是否也存在于x
并相应地填充x["coincidence"]
(例如,使用YES | NO,TRUE | FALSE ...)。
I would do the same in Excel with a formula combining IFERROR
and VLOOKUP
, but I don't know how to do the same with R. 我会在Excel中使用结合了IFERROR
和VLOOKUP
的公式来执行相同的操作,但是我不知道如何使用R来执行相同的操作。
Note: I am open to change data.frames to tables or use libraries. 注意:我愿意将data.frames更改为表或使用库。 The dataframe with the numbers to check ( y
) will never have more than 10-20 observations, while the other one ( x
) will never have more than 1K observations. 具有要检查的数字( y
)的数据框永远不会有超过10到20个观察值,而另一个( x
)永远不会有超过1K个观察值。 Therefore, I could also iterate with an if
, if it´s necessary 因此,如果需要的if
,我也可以使用if
进行迭代
We can create the vector matching the desired output with a set difference search that outputs boolean TRUE
and FALSE
values where appropriate. 我们可以通过设置差异搜索来创建与所需输出匹配的向量,该搜索将在适当的情况下输出布尔TRUE
和FALSE
值。 The sign %in%
, is a binary operator that compares the values on the left-hand side to the set of values on the right: 符号%in%
是二进制运算符,用于将左侧的值与右侧的值集进行比较:
x$coincidence <- x$numbers %in% y$numbers
# numbers coincidence
# 1 1 TRUE
# 2 2 FALSE
# 3 3 TRUE
# 4 4 FALSE
# 5 5 FALSE
# 6 6 FALSE
# 7 7 FALSE
# 8 8 FALSE
# 9 9 FALSE
Do numbers have to be factors, as you've set them up? 设置数字时,数字是否一定是要素? (They're not numbers, but character.) If not, it's easy: (它们不是数字,而是字符。)如果不是,这很容易:
x <- data.frame(numbers=c('1','2','3','4','5','6','7','8','9'), coincidence="NA", stringsAsFactors=FALSE)
y <- data.frame(numbers=c('1','3','10'), stringsAsFactors=FALSE)
x$coincidence[x$numbers %in% y$numbers] <- TRUE
> x
numbers coincidence
1 1 TRUE
2 2 NA
3 3 TRUE
4 4 NA
5 5 NA
6 6 NA
7 7 NA
8 8 NA
9 9 NA
If they need to be factors, then you'll need to either set common levels or use as.character(). 如果它们需要成为因素,那么您将需要设置通用级别或使用as.character()。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.