[英]Merge on two criteria, one is a list
I have two data frames that look something like this. 我有两个看起来像这样的数据框。 Let's call the first data frame "Master" 我们将第一个数据帧称为“主”
row ID Color
1 1 c("blue", "green")
2 1 red
3 2 red
4 3 c("pink", "blue", "purple")
Let's call the second data frame "Detail" 我们将第二个数据框称为“详细信息”
row ID Color Year
1 1 blue 2004
2 1 red 2000
3 1 green 2005
4 2 red 2005
5 3 pink 1999
6 3 brown 2008
7 3 blue 1997
8 3 pink 2007
I would like to add a column to Master that is the mean of the year values in Detail when two criteria are met: 当要满足两个条件时,我想在“ Master”中添加一列,该列是“ Detail”中年值的平均值:
I have figured out that the command... 我已经知道该命令...
which(Detail$Color == Master$Color)
...will identify the Color pattern, but applying that command to either a merge or an apply statement has not worked out. ...将标识“颜色”图案,但是将该命令应用于合并或apply语句仍未解决。
The result should look like this. 结果应如下所示。
row ID Color Mean_Year
1 1 c("blue", "green") 2004.5
2 1 red 2000
3 2 red 2005
4 3 c("pink", "blue", "purple") 2001
My real data has 10,000 rows in Master and 8,000,000 rows in Details, if that makes a difference. 我的真实数据在Master中有10,000行,在Details中有8,000,000行,如果有区别的话。
I'm not sure what your data input file for the data frame 'Master' looks like, but let's assume that it contains one numeric vector (ID) and one character vector (Color) whereby each cell of the character vector contains one or more colours separated by a comma. 我不确定数据框“主”的数据输入文件是什么样的,但让我们假设它包含一个数字矢量(ID)和一个字符矢量(Color),其中字符矢量的每个单元格都包含一个或多个颜色以逗号分隔。 If you import this into R, then you should get a data frame that looks like this: 如果将其导入到R中,则应该获得如下数据框:
row ID Color
1 1 "blue, green"
2 1 "red"
3 2 "red"
4 3 "pink, blue, purple"
I realise this is different from what's shown above for the data frame Master. 我意识到这与上面为数据帧主显示的内容不同。 Anyway, assuming that your input is similar to what I've just presented, once imported, the first thing to do is get rid of the white space in the vector master$Color 无论如何,假设您输入的内容与我刚才介绍的内容相似,一旦导入,首先要做的就是摆脱向量master $ Color中的空白
master$Color<-gsub(" ", "", master$Color)
Once that is done, it is relatively straightforward to do what you want: 一旦完成,就可以轻松地进行所需的操作:
ID.empty=NULL
for (i in 1:nrow(master)){ID.empty[i]=mean(detail$Year[detail$ID %in% master$ID[i] & is.element(detail$Color, unlist(c(strsplit(master$Color[i],','))))])}
print(ID.empty)
[1] 2004.5 2000.0 2005.0 2001.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.