[英]Different results applying merge in R over individual data frames and over one list of data frames
Hi everybody I am working with a list of data frames in R and I want to merge them one by one. 大家好,我正在处理R中的数据帧列表,我想一个一个地合并它们。 I found one solution is using Reduce()
function with merge()
but I don't get the same result when I merge one by one data frame. 我发现一种解决方案是将Reduce()
函数与merge()
但是当我一一合并数据帧时却没有得到相同的结果。 My list of data frames is global
and It has the next structure (I include dput()
version of my list in final part): 我的数据帧列表是global
,它具有下一个结构(我在最后一部分中包括了列表的dput()
版本):
global
$a1
ID Value Products z1
1 001 1 3 1
2 002 2 2 1
3 003 3 0 1
4 004 4 1 1
5 005 5 1 1
6 006 6 6 1
7 007 7 7 1
8 009 8 1 1
9 010 9 1 1
$a2
ID Value Products z2
1 001 1 3 2
2 002 2 2 2
3 003 3 0 2
4 004 4 1 2
5 005 5 1 2
6 006 6 6 2
7 011 10 5 2
8 012 11 5 2
9 007 7 7 2
10 009 8 1 2
11 010 9 1 2
$a3
ID Value Products z3
1 001 1 3 3
2 002 2 2 3
3 012 11 5 3
4 013 11 1 3
5 014 11 2 3
6 003 3 0 3
7 004 4 1 3
8 005 5 1 3
9 006 6 6 3
10 007 7 7 3
11 009 8 1 3
12 010 9 1 3
13 011 10 5 3
$a4
ID Value Products z4
1 001 1 3 4
2 002 2 2 4
3 012 11 5 4
4 013 11 1 4
5 014 11 2 4
6 003 3 0 4
7 004 4 1 4
8 005 5 1 4
9 006 6 6 4
10 007 7 7 4
11 009 8 1 4
12 010 9 1 4
13 011 10 5 4
14 015 12 3 4
15 016 12 3 4
$a5
ID Value Products z5
1 001 1 3 5
2 002 2 2 5
3 003 3 0 5
4 004 4 1 5
5 016 12 3 5
6 017 14 2 5
7 005 5 1 5
8 006 6 6 5
9 007 7 7 5
10 009 8 1 5
11 010 9 1 5
12 011 10 5 5
13 012 11 5 5
14 013 11 1 5
15 014 11 2 5
16 015 12 3 5
17 018 14 2 5
I am merging all data frames with their previous data frames in global
and for this I used the next code to create a new list named listag
: 我将所有数据框与它们之前的数据框global
合并,为此,我使用下一个代码创建了一个名为listag
的新列表:
listag=Reduce(function(x, y) merge(x,y[,c(1,4)],by=intersect(names(x)[1],names(y)[1]),all.x=TRUE),global,accumulate=TRUE)
I used the argument all.x=TRUE
in merge()
because I want to keep in each data frame their orginal number of rows ( a1
=9, a2
=11, a3
=13, a4
=15, a5
=17). 我在merge()
使用了all.x=TRUE
参数,因为我想在每个数据帧中保留其原始行数( a1
= 9, a2
= 11, a3
= 13, a4
= 15, a5
= 17)。 After of this I separated global
in individual data frames to check last code works fine and I found differences. 此后,我将global
数据分成单独的数据帧,以检查最后的代码是否工作正常,并发现了差异。 To separate data frames I used this code: 为了分离数据帧,我使用了以下代码:
list2env(global, envir=.GlobalEnv)
I got my five data frames. 我得到了五个数据框。 Now I am going to show what I want with data frames a4
and a5
. 现在,我将展示我想要的数据帧a4
和a5
。 First I used next code to merge a4
with a1
, a2
, a3
and a4
: 首先,我使用下一个代码将a4
与a1
, a2
, a3
和a4
合并:
Final41=merge(a4,a1[,c(1,4)],by=intersect(names(a4)[1],names(a1)[1]),all.x=TRUE)
Final42=merge(Final41,a2[,c(1,4)],by=intersect(names(Final41)[1],names(a2)[1]),all.x=TRUE)
Final43=merge(Final42,a3[,c(1,4)],by=intersect(names(Final42)[1],names(a3)[1]),all.x=TRUE)
Final4=merge(Final43,a4[,c(1,4)],by=intersect(names(Final43)[1],names(a4)[1]),all.x=TRUE)
The result of this code is: 此代码的结果是:
Final4
ID Value Products z4.x z1 z2 z3 z4.y
1 001 1 3 4 1 2 3 4
2 002 2 2 4 1 2 3 4
3 003 3 0 4 1 2 3 4
4 004 4 1 4 1 2 3 4
5 005 5 1 4 1 2 3 4
6 006 6 6 4 1 2 3 4
7 007 7 7 4 1 2 3 4
8 009 8 1 4 1 2 3 4
9 010 9 1 4 1 2 3 4
10 011 10 5 4 NA 2 3 4
11 012 11 5 4 NA 2 3 4
12 013 11 1 4 NA NA 3 4
13 014 11 2 4 NA NA 3 4
14 015 12 3 4 NA NA NA 4
15 016 12 3 4 NA NA NA 4
Where the argument all.x=TRUE
is working fine because I keep the original number of observations in a4
(15). 其中all.x=TRUE
的参数可以正常工作,因为我将原始观测值保留在a4
(15)中。 When I extract the 4th element of listag
I got this: 当我提取listag
的第四个元素时,我得到了:
f4l=listag[[4]]
f4l
ID Value Products z1 z2 z3 z4
1 001 1 3 1 2 3 4
2 002 2 2 1 2 3 4
3 003 3 0 1 2 3 4
4 004 4 1 1 2 3 4
5 005 5 1 1 2 3 4
6 006 6 6 1 2 3 4
7 007 7 7 1 2 3 4
8 009 8 1 1 2 3 4
9 010 9 1 1 2 3 4
For merge()
in Reduce()
function I am considering also all.x=TRUE
but I don't get the same result and the number of observations is wrong. 对于Reduce()
函数中的merge()
,我也在考虑all.x=TRUE
但是我没有得到相同的结果,并且观察次数错误。 I would like to get after applying the combination of Reduce()
and merge()
the result of Final4
. 我想在应用Reduce()
和merge()
的组合后得到Final4
的结果。 It is the same for the rest of data frames of listag
after applying Reduce()
and merge()
combined over global
. 这是一个数据帧的其余部分相同listag
申请后Reduce()
和merge()
合并了global
。 I would like to get this result for each data frame in listag
(in this case for 4th data frame it would be): 我想为listag
每个数据框获取此结果(在本例中为第4个数据框):
ID Value Products z1 z2 z3 z4
1 001 1 3 1 2 3 4
2 002 2 2 1 2 3 4
3 003 3 0 1 2 3 4
4 004 4 1 1 2 3 4
5 005 5 1 1 2 3 4
6 006 6 6 1 2 3 4
7 007 7 7 1 2 3 4
8 009 8 1 1 2 3 4
9 010 9 1 1 2 3 4
10 011 10 5 NA 2 3 4
11 012 11 5 NA 2 3 4
12 013 11 1 NA NA 3 4
13 014 11 2 NA NA 3 4
14 015 12 3 NA NA NA 4
15 016 12 3 NA NA NA 4
I don't know what is wrong in my code when I combine Reduce()
and merge()
. 当我结合使用Reduce()
和merge()
时,我不知道我的代码有什么问题。 I am considering all.x=TRUE
equal when I make the merge one by one data frame. 我考虑当我一一合并数据帧时, all.x=TRUE
相等。 Could you help me with this. 你能帮我这个忙吗? Maybe I have to add another argument in the combination of Reduce()
and merge()
to get my result or there is other way like use lapply
or llply
from plyr
package over global
. 也许我必须在Reduce()
和merge()
的组合中添加另一个参数以获得我的结果,或者还有其他方法,例如在global
使用plyr
包中的lapply
或llply
。 The dput()
version of global is the next: 全局的dput()
版本是下一个:
structure(list(a1 = structure(list(ID = c("001", "002", "003",
"004", "005", "006", "007", "009", "010"), Value = c(1, 2, 3,
4, 5, 6, 7, 8, 9), Products = c(3, 2, 0, 1, 1, 6, 7, 1, 1), z1 = c(1,
1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("ID", "Value", "Products",
"z1"), row.names = c(NA, 9L), class = "data.frame"), a2 = structure(list(
ID = c("001", "002", "003", "004", "005", "006", "011", "012",
"007", "009", "010"), Value = c(1, 2, 3, 4, 5, 6, 10, 11,
7, 8, 9), Products = c(3, 2, 0, 1, 1, 6, 5, 5, 7, 1, 1),
z2 = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)), .Names = c("ID",
"Value", "Products", "z2"), row.names = c(NA, 11L), class = "data.frame"),
a3 = structure(list(ID = c("001", "002", "012", "013", "014",
"003", "004", "005", "006", "007", "009", "010", "011"),
Value = c(1, 2, 11, 11, 11, 3, 4, 5, 6, 7, 8, 9, 10),
Products = c(3, 2, 5, 1, 2, 0, 1, 1, 6, 7, 1, 1, 5),
z3 = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3)), .Names = c("ID",
"Value", "Products", "z3"), row.names = c(NA, 13L), class = "data.frame"),
a4 = structure(list(ID = c("001", "002", "012", "013", "014",
"003", "004", "005", "006", "007", "009", "010", "011", "015",
"016"), Value = c(1, 2, 11, 11, 11, 3, 4, 5, 6, 7, 8, 9,
10, 12, 12), Products = c(3, 2, 5, 1, 2, 0, 1, 1, 6, 7, 1,
1, 5, 3, 3), z4 = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4)), .Names = c("ID", "Value", "Products", "z4"), row.names = c(NA,
15L), class = "data.frame"), a5 = structure(list(ID = c("001",
"002", "003", "004", "016", "017", "005", "006", "007", "009",
"010", "011", "012", "013", "014", "015", "018"), Value = c(1,
2, 3, 4, 12, 14, 5, 6, 7, 8, 9, 10, 11, 11, 11, 12, 14),
Products = c(3, 2, 0, 1, 3, 2, 1, 6, 7, 1, 1, 5, 5, 1,
2, 3, 2), z5 = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5)), .Names = c("ID", "Value", "Products",
"z5"), row.names = c(NA, 17L), class = "data.frame")), .Names = c("a1",
"a2", "a3", "a4", "a5"))
Many thanks for your help. 非常感谢您的帮助。
Several things: 几件事:
*First, it is normal that your Reduced merge and your manual merge* give different results since they are not performed in the same order. *首先,由于精简合并和手动合并*的执行顺序不同,通常会给出不同的结果。 The Reduce processes 1:4, and for a reason I do not quite understand, in your manual merges your perform 4,1,2,3,4. 减少过程1:4,由于某种原因我不太了解,在您的手册中合并了您的表演4,1,2,3,4。
Second, the difference that you observe is that the a4 table has additional IDs, and they are lost in the Reduced merge, because you use all.x=TRUE, but the a4 table came as the "y" table. 其次,您观察到的区别是a4表具有其他ID,并且由于使用all.x = TRUE而在简化合并中丢失了,因为a4表是“ y”表。 So you should use all=TRUE instead: 因此,您应该使用all = TRUE代替:
listag <- Reduce(function(x, y) merge(x, y[, c(1, 4)],
by = intersect(names(x)[1], names(y)[1]), all = TRUE), global)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.