简体   繁体   English

使用合并或匹配函数匹配R中两个数据帧中的多个列

[英]Matching multiple columns in two dataframes in R using the merge or match function

I have two data frames that look something like this: 我有两个数据框,看起来像这样:

Date        Shop    Item    ProductKey     Price
2014-09-01  Asda    Apple   0f-7c-32-9c65  2.00
2014-09-01  Tesco   Pear    7c-e9-a0-a11c  1.50

And so on for, for different dates, shops, items, product keys and prices. 等等,适用于不同的日期,商店,商品,产品密钥和价格。 Data frame two is of the same structure but for the following year. 数据框2具有相同的结构,但是对于下一年。

What I want to do is match items of the same date, shop, item and product key in the two different data frames (call them September2014 and September2015) - and when they match on all variables to create a price relative (ie divide the 2015 price by the 2014 price). 我想要做的是匹配两个不同数据框中相同日期,商店,项目和产品密钥的项目(称为2014年9月和2015年9月) - 当它们匹配所有变量以创建价格相对时(即除以2015年)价格按2014年价格计算)。

I have tried various if statements and the match function but don't seem to be getting anywhere. 我尝试了各种if语句和匹配函数,但似乎没有得到任何结果。 I know there must be a simple way to do this that I am completely missing. 我知道必须有一个简单的方法来做到这一点,我完全失踪了。 Any help would be greatly appreciated. 任何帮助将不胜感激。 I have also looked at examples of the merge function but I don't think that would be useful in my case. 我还查看了合并函数的示例,但我不认为这对我的情况有用。 I have gone through many questions on the site involving matching and attempted to use some suggested code, but again none seem to be relevant in my case. 我在网站上经历了很多关于匹配和试图使用一些建议代码的问题,但在我的案例中似乎没有任何相关内容。

Reconsider the merge approach: 重新考虑merge方法:

# FIRST DATAFRAME (2014)
txt='Date        Shop    Item    ProductKey     Price
2014-09-01  Asda    Apple   0f-7c-32-9c65  2.00
2014-09-01  Tesco   Pear    7c-e9-a0-a11c  1.50'

df1 <- read.table(text=txt, header=TRUE)
df1$Date <- as.POSIXct(df1$Date)             # CONVERT TO DATE
df1$Month <- format(df1$Date, "%m")          # EXTRACT MONTH (CAN ADJUST FOR MM/DD)

# SECOND DATAFRAME (2015)
txt='Date        Shop    Item    ProductKey     Price
2015-09-01  Asda    Apple   0f-7c-32-9c65  2.25
2015-09-01  Tesco   Pear    7c-e9-a0-a11c  1.75'

df2 <- read.table(text=txt, header=TRUE)
df2$Date <- as.POSIXct(df2$Date)              # CONVERT TO DATE
df2$Month <- format(df2$Date, "%m")           # EXTRACT MONTH (CAN ADJUST FOR MM/DD)

# MERGE AND TRANSFORM FOR NEW COLUMN
finaldf <- transform(merge(df1, df2, by=c("Month", "Shop", "Item", "ProductKey"), suffixes=c("_14", "_15")), 
                     PriceRelative = Price_15 / Price_14)    
finaldf
#   Month  Shop  Item    ProductKey    Date_14 Price_14    Date_15 Price_15 PriceRelative
# 1    09  Asda Apple 0f-7c-32-9c65 2014-09-01      2.0 2015-09-01     2.25      1.125000
# 2    09 Tesco  Pear 7c-e9-a0-a11c 2014-09-01      1.5 2015-09-01     1.75      1.166667

Prices in 2014 and 2015. Note that in 2015 there is an item that is not matched by one in 2014. First generate a hash as a key and then match to import the 2014 price of the item into your 2015 dataframe. 2014年和2015年的价格。请注意,2015年有一个项目在2014年与一个项目不匹配。首先生成哈希作为关键字,然后匹配将项目的2014年价格导入2015年数据框。 Then Divide: 然后划分:

df2014 <- data.frame(Date = as.Date(c("2014-09-01", "2014-09-01")),
                     Shop = c("Asda", "Tesco"),
                     Item = c("Apple", "Pear"),
                     ProductKey = c("0f-7c-32-9c65","7c-e9-a0-a11c"),
                     Price = c(2.00, 1.50), stringsAsFactors = FALSE)

df2015 <- data.frame(Date = as.Date(c("2015-09-01", "2015-09-01", "2015-09-01")),
                     Shop = c("Asda", "Tesco", "foo"),
                     Item = c("Apple", "Pear", "Orange"),
                     ProductKey = c("0f-7c-32-9c65","7c-e9-a0-a11c", "blah"),
                     Price = c(2.20, 1.70, 3.00), stringsAsFactors = FALSE)

df2014$key <- paste0(strftime(df2014$Date, "%m"),
                     strftime(df2014$Date, "%d"),
                     df2014$Shop,
                     df2014$Item,
                     df2014$ProductKey)

df2015$key <- paste0(strftime(df2015$Date, "%m"),
                     strftime(df2015$Date, "%d"),
                     df2015$Shop,
                     df2015$Item,
                     df2015$ProductKey)

df2015$price_2014 <- df2014$Price[match(df2015$key, df2014$key)]
df2015$price_ratio <- df2015$Price/df2015$price_2014
df2015

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM