简体   繁体   English

在R中合并到单个数据帧和一个数据帧列表上的不同结果

[英]Different results applying merge in R over individual data frames and over one list of data frames

Hi everybody I am working with a list of data frames in R and I want to merge them one by one. 大家好,我正在处理R中的数据帧列表,我想一个一个地合并它们。 I found one solution is using Reduce() function with merge() but I don't get the same result when I merge one by one data frame. 我发现一种解决方案是将Reduce()函数与merge()但是当我一一合并数据帧时却没有得到相同的结果。 My list of data frames is global and It has the next structure (I include dput() version of my list in final part): 我的数据帧列表是global ,它具有下一个结构(我在最后一部分中包括了列表的dput()版本):

global
$a1
   ID Value Products z1
1 001     1        3  1
2 002     2        2  1
3 003     3        0  1
4 004     4        1  1
5 005     5        1  1
6 006     6        6  1
7 007     7        7  1
8 009     8        1  1
9 010     9        1  1

$a2
    ID Value Products z2
1  001     1        3  2
2  002     2        2  2
3  003     3        0  2
4  004     4        1  2
5  005     5        1  2
6  006     6        6  2
7  011    10        5  2
8  012    11        5  2
9  007     7        7  2
10 009     8        1  2
11 010     9        1  2

$a3
    ID Value Products z3
1  001     1        3  3
2  002     2        2  3
3  012    11        5  3
4  013    11        1  3
5  014    11        2  3
6  003     3        0  3
7  004     4        1  3
8  005     5        1  3
9  006     6        6  3
10 007     7        7  3
11 009     8        1  3
12 010     9        1  3
13 011    10        5  3

$a4
    ID Value Products z4
1  001     1        3  4
2  002     2        2  4
3  012    11        5  4
4  013    11        1  4
5  014    11        2  4
6  003     3        0  4
7  004     4        1  4
8  005     5        1  4
9  006     6        6  4
10 007     7        7  4
11 009     8        1  4
12 010     9        1  4
13 011    10        5  4
14 015    12        3  4
15 016    12        3  4

$a5
    ID Value Products z5
1  001     1        3  5
2  002     2        2  5
3  003     3        0  5
4  004     4        1  5
5  016    12        3  5
6  017    14        2  5
7  005     5        1  5
8  006     6        6  5
9  007     7        7  5
10 009     8        1  5
11 010     9        1  5
12 011    10        5  5
13 012    11        5  5
14 013    11        1  5
15 014    11        2  5
16 015    12        3  5
17 018    14        2  5

I am merging all data frames with their previous data frames in global and for this I used the next code to create a new list named listag : 我将所有数据框与它们之前的数据框global合并,为此,我使用下一个代码创建了一个名为listag的新列表:

listag=Reduce(function(x, y) merge(x,y[,c(1,4)],by=intersect(names(x)[1],names(y)[1]),all.x=TRUE),global,accumulate=TRUE)

I used the argument all.x=TRUE in merge() because I want to keep in each data frame their orginal number of rows ( a1 =9, a2 =11, a3 =13, a4 =15, a5 =17). 我在merge()使用了all.x=TRUE参数,因为我想在每个数据帧中保留其原始行数( a1 = 9, a2 = 11, a3 = 13, a4 = 15, a5 = 17)。 After of this I separated global in individual data frames to check last code works fine and I found differences. 此后,我将global数据分成单独的数据帧,以检查最后的代码是否工作正常,并发现了差异。 To separate data frames I used this code: 为了分离数据帧,我使用了以下代码:

list2env(global, envir=.GlobalEnv)

I got my five data frames. 我得到了五个数据框。 Now I am going to show what I want with data frames a4 and a5 . 现在,我将展示我想要的数据帧a4a5 First I used next code to merge a4 with a1 , a2 , a3 and a4 : 首先,我使用下一个代码将a4a1a2a3a4合并:

Final41=merge(a4,a1[,c(1,4)],by=intersect(names(a4)[1],names(a1)[1]),all.x=TRUE)
Final42=merge(Final41,a2[,c(1,4)],by=intersect(names(Final41)[1],names(a2)[1]),all.x=TRUE)
Final43=merge(Final42,a3[,c(1,4)],by=intersect(names(Final42)[1],names(a3)[1]),all.x=TRUE)
Final4=merge(Final43,a4[,c(1,4)],by=intersect(names(Final43)[1],names(a4)[1]),all.x=TRUE)

The result of this code is: 此代码的结果是:

Final4

    ID Value Products z4.x z1 z2 z3 z4.y
1  001     1        3    4  1  2  3    4
2  002     2        2    4  1  2  3    4
3  003     3        0    4  1  2  3    4
4  004     4        1    4  1  2  3    4
5  005     5        1    4  1  2  3    4
6  006     6        6    4  1  2  3    4
7  007     7        7    4  1  2  3    4
8  009     8        1    4  1  2  3    4
9  010     9        1    4  1  2  3    4
10 011    10        5    4 NA  2  3    4
11 012    11        5    4 NA  2  3    4
12 013    11        1    4 NA NA  3    4
13 014    11        2    4 NA NA  3    4
14 015    12        3    4 NA NA NA    4
15 016    12        3    4 NA NA NA    4

Where the argument all.x=TRUE is working fine because I keep the original number of observations in a4 (15). 其中all.x=TRUE的参数可以正常工作,因为我将原始观测值保留在a4 (15)中。 When I extract the 4th element of listag I got this: 当我提取listag的第四个元素时,我得到了:

f4l=listag[[4]]
f4l

  ID  Value Products z1 z2 z3 z4
1 001     1        3  1  2  3  4
2 002     2        2  1  2  3  4
3 003     3        0  1  2  3  4
4 004     4        1  1  2  3  4
5 005     5        1  1  2  3  4
6 006     6        6  1  2  3  4
7 007     7        7  1  2  3  4
8 009     8        1  1  2  3  4
9 010     9        1  1  2  3  4

For merge() in Reduce() function I am considering also all.x=TRUE but I don't get the same result and the number of observations is wrong. 对于Reduce()函数中的merge() ,我也在考虑all.x=TRUE但是我没有得到相同的结果,并且观察次数错误。 I would like to get after applying the combination of Reduce() and merge() the result of Final4 . 我想在应用Reduce()merge()的组合后得到Final4的结果。 It is the same for the rest of data frames of listag after applying Reduce() and merge() combined over global . 这是一个数据帧的其余部分相同listag申请后Reduce()merge()合并了global I would like to get this result for each data frame in listag (in this case for 4th data frame it would be): 我想为listag每个数据框获取此结果(在本例中为第4个数据框):

   ID  Value Products  z1 z2 z3  z4
1  001     1        3  1  2  3    4
2  002     2        2  1  2  3    4
3  003     3        0  1  2  3    4
4  004     4        1  1  2  3    4
5  005     5        1  1  2  3    4
6  006     6        6  1  2  3    4
7  007     7        7  1  2  3    4
8  009     8        1  1  2  3    4
9  010     9        1  1  2  3    4
10 011    10        5 NA  2  3    4
11 012    11        5 NA  2  3    4
12 013    11        1 NA NA  3    4
13 014    11        2 NA NA  3    4
14 015    12        3 NA NA NA    4
15 016    12        3 NA NA NA    4

I don't know what is wrong in my code when I combine Reduce() and merge() . 当我结合使用Reduce()merge()时,我不知道我的代码有什么问题。 I am considering all.x=TRUE equal when I make the merge one by one data frame. 我考虑当我一一合并数据帧时, all.x=TRUE相等。 Could you help me with this. 你能帮我这个忙吗? Maybe I have to add another argument in the combination of Reduce() and merge() to get my result or there is other way like use lapply or llply from plyr package over global . 也许我必须在Reduce()merge()的组合中添加另一个参数以获得我的结果,或者还有其他方法,例如在global使用plyr包中的lapplyllply The dput() version of global is the next: 全局的dput()版本是下一个:

structure(list(a1 = structure(list(ID = c("001", "002", "003", 
"004", "005", "006", "007", "009", "010"), Value = c(1, 2, 3, 
4, 5, 6, 7, 8, 9), Products = c(3, 2, 0, 1, 1, 6, 7, 1, 1), z1 = c(1, 
1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("ID", "Value", "Products", 
"z1"), row.names = c(NA, 9L), class = "data.frame"), a2 = structure(list(
    ID = c("001", "002", "003", "004", "005", "006", "011", "012", 
    "007", "009", "010"), Value = c(1, 2, 3, 4, 5, 6, 10, 11, 
    7, 8, 9), Products = c(3, 2, 0, 1, 1, 6, 5, 5, 7, 1, 1), 
    z2 = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), .Names = c("ID", 
"Value", "Products", "z2"), row.names = c(NA, 11L), class = "data.frame"), 
    a3 = structure(list(ID = c("001", "002", "012", "013", "014", 
    "003", "004", "005", "006", "007", "009", "010", "011"), 
        Value = c(1, 2, 11, 11, 11, 3, 4, 5, 6, 7, 8, 9, 10), 
        Products = c(3, 2, 5, 1, 2, 0, 1, 1, 6, 7, 1, 1, 5), 
        z3 = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("ID", 
    "Value", "Products", "z3"), row.names = c(NA, 13L), class = "data.frame"), 
    a4 = structure(list(ID = c("001", "002", "012", "013", "014", 
    "003", "004", "005", "006", "007", "009", "010", "011", "015", 
    "016"), Value = c(1, 2, 11, 11, 11, 3, 4, 5, 6, 7, 8, 9, 
    10, 12, 12), Products = c(3, 2, 5, 1, 2, 0, 1, 1, 6, 7, 1, 
    1, 5, 3, 3), z4 = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
    4, 4)), .Names = c("ID", "Value", "Products", "z4"), row.names = c(NA, 
    15L), class = "data.frame"), a5 = structure(list(ID = c("001", 
    "002", "003", "004", "016", "017", "005", "006", "007", "009", 
    "010", "011", "012", "013", "014", "015", "018"), Value = c(1, 
    2, 3, 4, 12, 14, 5, 6, 7, 8, 9, 10, 11, 11, 11, 12, 14), 
        Products = c(3, 2, 0, 1, 3, 2, 1, 6, 7, 1, 1, 5, 5, 1, 
        2, 3, 2), z5 = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
        5, 5, 5, 5, 5)), .Names = c("ID", "Value", "Products", 
    "z5"), row.names = c(NA, 17L), class = "data.frame")), .Names = c("a1", 
"a2", "a3", "a4", "a5")) 

Many thanks for your help. 非常感谢您的帮助。

Several things: 几件事:

*First, it is normal that your Reduced merge and your manual merge* give different results since they are not performed in the same order. *首先,由于精简合并和手动合并*的执行顺序不同,通常会给出不同的结果。 The Reduce processes 1:4, and for a reason I do not quite understand, in your manual merges your perform 4,1,2,3,4. 减少过程1:4,由于某种原因我不太了解,在您的手册中合并了您的表演4,1,2,3,4。

Second, the difference that you observe is that the a4 table has additional IDs, and they are lost in the Reduced merge, because you use all.x=TRUE, but the a4 table came as the "y" table. 其次,您观察到的区别是a4表具有其他ID,并且由于使用all.x = TRUE而在简化合并中丢失了,因为a4表是“ y”表。 So you should use all=TRUE instead: 因此,您应该使用all = TRUE代替:

listag <- Reduce(function(x, y) merge(x, y[, c(1, 4)],
          by = intersect(names(x)[1], names(y)[1]), all = TRUE), global)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM