简体   繁体   English

合并两个条件,一个是列表

[英]Merge on two criteria, one is a list

I have two data frames that look something like this. 我有两个看起来像这样的数据框。 Let's call the first data frame "Master" 我们将第一个数据帧称为“主”

row ID  Color
1   1   c("blue", "green")
2   1   red
3   2   red
4   3   c("pink", "blue", "purple")

Let's call the second data frame "Detail" 我们将第二个数据框称为“详细信息”

row ID  Color   Year
1   1   blue    2004
2   1   red     2000
3   1   green   2005
4   2   red     2005
5   3   pink    1999
6   3   brown   2008
7   3   blue    1997
8   3   pink    2007

I would like to add a column to Master that is the mean of the year values in Detail when two criteria are met: 当要满足两个条件时,我想在“ Master”中添加一列,该列是“ Detail”中年值的平均值:

  1. ID matches (this is easy) ID匹配(这很容易)
  2. When Detail$Color is found in the list Master$Color. 在列表Master $ Color中找到Detail $ Color时。 (this has proven to be difficult). (事实证明这很困难)。

I have figured out that the command... 我已经知道该命令...

which(Detail$Color == Master$Color)

...will identify the Color pattern, but applying that command to either a merge or an apply statement has not worked out. ...将标识“颜色”图案,但是将该命令应用于合并或apply语句仍未解决。

The result should look like this. 结果应如下所示。

row ID  Color                       Mean_Year
1   1   c("blue", "green")          2004.5
2   1   red                         2000
3   2   red                         2005
4   3   c("pink", "blue", "purple") 2001

My real data has 10,000 rows in Master and 8,000,000 rows in Details, if that makes a difference. 我的真实数据在Master中有10,000行,在Details中有8,000,000行,如果有区别的话。

I'm not sure what your data input file for the data frame 'Master' looks like, but let's assume that it contains one numeric vector (ID) and one character vector (Color) whereby each cell of the character vector contains one or more colours separated by a comma. 我不确定数据框“主”的数据输入文件是什么样的,但让我们假设它包含一个数字矢量(ID)和一个字符矢量(Color),其中字符矢量的每个单元格都包含一个或多个颜色以逗号分隔。 If you import this into R, then you should get a data frame that looks like this: 如果将其导入到R中,则应该获得如下数据框:

row     ID     Color       
1       1      "blue, green"    
2       1      "red"    
3       2      "red"     
4       3      "pink, blue, purple"

I realise this is different from what's shown above for the data frame Master. 我意识到这与上面为数据帧主显示的内容不同。 Anyway, assuming that your input is similar to what I've just presented, once imported, the first thing to do is get rid of the white space in the vector master$Color 无论如何,假设您输入的内容与我刚才介绍的内容相似,一旦导入,首先要做的就是摆脱向量master $ Color中的空白

master$Color<-gsub(" ", "", master$Color)

Once that is done, it is relatively straightforward to do what you want: 一旦完成,就可以轻松地进行所需的操作:

ID.empty=NULL

for (i in 1:nrow(master)){ID.empty[i]=mean(detail$Year[detail$ID %in% master$ID[i] & is.element(detail$Color, unlist(c(strsplit(master$Color[i],','))))])}

print(ID.empty)

[1] 2004.5 2000.0 2005.0 2001.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM