在 R 数据框中插入值的简单查找

Question

This is a seemingly simple R question, but I don't see an exact answer here.这是一个看似简单的 R 问题，但我在这里没有看到确切的答案。 I have a data frame (alldata) that looks like this:我有一个如下所示的数据框（alldata）：

Case     zip     market
1        44485   NA
2        44488   NA
3        43210   NA

There are over 3.5 million records.有超过 350 万条记录。

Then, I have a second data frame, 'zipcodes'.然后，我有第二个数据框“邮政编码”。

market    zip
1         44485
1         44486
1         44488
...       ... (100 zips in market 1)
2         43210
2         43211
...       ... (100 zips in market 2, etc.)

I want to find the correct value for alldata$market for each case based on alldata$zip matching the appropriate value in the zipcode data frame.我想根据与 zipcode 数据框中的适当值匹配的 alldata$zip 为每个案例找到 alldata$market 的正确值。 I'm just looking for the right syntax, and assistance is much appreciated, as usual.我只是在寻找正确的语法，像往常一样，非常感谢您的帮助。

Answer 1

Since you don't care about the market column in alldata , you can first strip it off using and merge the columns in alldata and zipcodes based on the zip column using merge :由于您不关心alldata的market列，您可以首先使用 merge 将其剥离并基于zip列merge alldata和zipcodes的列：

merge(alldata[, c("Case", "zip")], zipcodes, by="zip")

The by parameter specifies the key criteria, so if you have a compound key, you could do something like by=c("zip", "otherfield") . by参数指定键条件，因此如果您有复合键，您可以执行类似by=c("zip", "otherfield") 。

Answer 2

另一个对我有用并且非常简单的选择：

alldata$market<-with(zipcodes, market[match(alldata$zip, zip)])

Answer 3

With such a large data set you may want the speed of an environment lookup.对于如此大的数据集，您可能需要环境查找的速度。 You can use the lookup function from the qdapTools package as follows:您可以使用qdapTools 包中的lookup功能，如下所示：

library(qdapTools)
alldata$market <- lookup(alldata$zip, zipcodes[, 2:1])

Or要么

alldata$zip %l% zipcodes[, 2:1]

Answer 4

Here's the dplyr way of doing it:这是执行此操作的dplyr方式：

library(tidyverse)
alldata %>%
  select(-market) %>%
  left_join(zipcodes, by="zip")

which, on my machine, is roughly the same performance as lookup .在我的机器上，它的性能与lookup大致相同。

Answer 5

The syntax of match is a bit clumsy. match的语法有点笨拙。 You might find the lookup package easier to use.您可能会发现lookup包更易于使用。

alldata <- data.frame(Case=1:3, zip=c(44485,44488,43210), market=c(NA,NA,NA))
zipcodes <- data.frame(market=c(1,1,1,2,2), zip=c(44485,44486,44488,43210,43211))
alldata$market <- lookup(alldata$zip, zipcodes$zip, zipcodes$market)
alldata
##   Case   zip market
## 1    1 44485      1
## 2    2 44488      1
## 3    3 43210      2

在 R 数据框中插入值的简单查找

问题描述

5 个解决方案

解决方案1
14 2013-07-24 20:48:21

解决方案2
9 2017-07-28 13:56:27

解决方案3
3 2013-07-24 22:13:17

解决方案4
3 2017-05-18 10:14:42

解决方案5
0 2021-04-14 16:31:45

在 R 数据框中插入值的简单查找

问题描述

5 个解决方案

解决方案1 14 2013-07-24 20:48:21

解决方案2 9 2017-07-28 13:56:27

解决方案3 3 2013-07-24 22:13:17

解决方案4 3 2017-05-18 10:14:42

解决方案5 0 2021-04-14 16:31:45

解决方案1
14 2013-07-24 20:48:21

解决方案2
9 2017-07-28 13:56:27

解决方案3
3 2013-07-24 22:13:17

解决方案4
3 2017-05-18 10:14:42

解决方案5
0 2021-04-14 16:31:45