[英]Simple lookup to insert values in an R data frame
This is a seemingly simple R question, but I don't see an exact answer here.这是一个看似简单的 R 问题,但我在这里没有看到确切的答案。 I have a data frame (alldata) that looks like this:我有一个如下所示的数据框(alldata):
Case zip market
1 44485 NA
2 44488 NA
3 43210 NA
There are over 3.5 million records.有超过 350 万条记录。
Then, I have a second data frame, 'zipcodes'.然后,我有第二个数据框“邮政编码”。
market zip
1 44485
1 44486
1 44488
... ... (100 zips in market 1)
2 43210
2 43211
... ... (100 zips in market 2, etc.)
I want to find the correct value for alldata$market for each case based on alldata$zip matching the appropriate value in the zipcode data frame.我想根据与 zipcode 数据框中的适当值匹配的 alldata$zip 为每个案例找到 alldata$market 的正确值。 I'm just looking for the right syntax, and assistance is much appreciated, as usual.我只是在寻找正确的语法,像往常一样,非常感谢您的帮助。
Since you don't care about the market
column in alldata
, you can first strip it off using and merge the columns in alldata
and zipcodes
based on the zip
column using merge
:由于您不关心alldata
的market
列,您可以首先使用 merge 将其剥离并基于zip
列merge
alldata
和zipcodes
的列:
merge(alldata[, c("Case", "zip")], zipcodes, by="zip")
The by
parameter specifies the key criteria, so if you have a compound key, you could do something like by=c("zip", "otherfield")
. by
参数指定键条件,因此如果您有复合键,您可以执行类似by=c("zip", "otherfield")
。
另一个对我有用并且非常简单的选择:
alldata$market<-with(zipcodes, market[match(alldata$zip, zip)])
With such a large data set you may want the speed of an environment lookup.对于如此大的数据集,您可能需要环境查找的速度。 You can use the lookup
function from the qdapTools package as follows:您可以使用qdapTools 包中的lookup
功能,如下所示:
library(qdapTools)
alldata$market <- lookup(alldata$zip, zipcodes[, 2:1])
Or要么
alldata$zip %l% zipcodes[, 2:1]
Here's the dplyr
way of doing it:这是执行此操作的dplyr
方式:
library(tidyverse)
alldata %>%
select(-market) %>%
left_join(zipcodes, by="zip")
which, on my machine, is roughly the same performance as lookup
.在我的机器上,它的性能与lookup
大致相同。
The syntax of match
is a bit clumsy. match
的语法有点笨拙。 You might find the lookup
package easier to use.您可能会发现lookup
包更易于使用。
alldata <- data.frame(Case=1:3, zip=c(44485,44488,43210), market=c(NA,NA,NA))
zipcodes <- data.frame(market=c(1,1,1,2,2), zip=c(44485,44486,44488,43210,43211))
alldata$market <- lookup(alldata$zip, zipcodes$zip, zipcodes$market)
alldata
## Case zip market
## 1 1 44485 1
## 2 2 44488 1
## 3 3 43210 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.