简体   繁体   English

R中合并和匹配函数之间的区别

[英]Differences between merge and match functions in R

I everybody I remove my last post to make a reproducible exmaple of my problem. 我每个人都删除我的最新帖子,以重现我的问题。 I am working with the next to data frames a1 (dput structure): 我正在处理数据帧a1 (dput结构)旁边的内容:

structure(list(r04_numero_operacion = c("0050475725", "0050490602", 
"0050491033", "0050496386", "0050518985", "0050630090", "0050631615", 
"0060235906", "0060238732", "0060241333", "0060244391", "0060245813", 
"0060260056", "0060266356", "0800041441", "0800054041", "0800055382", 
"0800058554", "2020200062", "2020200073", "CAR1010001706000", 
"CAR1010001795000", "CAR1010001803000", "CAR1010001871000", "CAR1010001962000", 
"CAR1010002002000", "CAR1010002120000", "CAR1010002189000", "CAR1010002215000", 
"CAR1010002250000"), perdida3 = c(523.12, 265.43, 8371.66, 5242.13, 
4960.51, 8473.27, 3743.45, 1283.32, 2229.25, 8001.27, 8653.94, 
3670.13, 4536.02, 8216.55, 2481.36, 288.94, 1637.28, 4566.89, 
1573.63, 11217.92, 0, 0, 0, 0, 0, 0, 0, 0, 9633.9, 0), Saldo = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28, 4566.89, 
1, 1, 481.59, 299.52, 258.13, 603.84, 231.61, 631.68, 220.6, 
210.54, 1, 1224.44), Bvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 603.84, 0, 631.68, 
0, 0, 0, 0), Cvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1224.44), 
    Dvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), vencida = c(1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28, 
    4566.89, 1, 1, 0, 0, 0, 603.84, 0, 631.68, 0, 0, 1, 1224.44
    ), V1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("r04_numero_operacion", 
"perdida3", "Saldo", "Bvencida", "Cvencida", "Dvencida", "vencida", 
"V1"), codepage = 1252L, row.names = c(NA, 30L), class = "data.frame")

And a2 data frame (dput structure): a2数据帧(dput结构):

structure(list(r04_numero_operacion = c("0050475725", "0050490602", 
"0050491033", "0050496386", "0050518985", "0050630090", "0050631615", 
"0060235906", "0060238732", "0060241333", "0060244391", "0060245813", 
"0060260056", "0060266356", "0800041441", "0800054041", "0800055382", 
"0800058554", "2020200073", "CAR1010002002000", "CAR1010002189000", 
"CAR1010002215000", "CAR1010002250000", "CAR1010002264000", "CAR1010002297000", 
"CAR1010002401000", "CAR1010002412000", "CAR1010002436000", "CAR1010002529000", 
"CAR1010002709000"), perdida3 = c(523.12, 265.43, 8371.66, 5242.13, 
4960.51, 8473.27, 3743.45, 1283.32, 2229.25, 8001.27, 8653.94, 
3670.13, 4536.02, 8216.55, 2481.36, 288.94, 1637.28, 4566.89, 
11217.92, 0, 0, 9633.9, 0, 0, 0, 0, 0, 0, 0, 0), Saldo = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28, 4566.89, 
1, 317.72, 210.54, 1, 868.93, 242.91, 298.78, 120.63, 255.01, 
357.68, 284.08, 308.83), Bvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 317.72, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0), Cvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 868.93, 0, 0, 0, 0, 0, 0, 0), Dvencida = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0), vencida = c(1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28, 4566.89, 1, 317.72, 0, 
1, 868.93, 0, 0, 0, 0, 0, 0, 0), V2 = c(2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2)), .Names = c("r04_numero_operacion", "perdida3", "Saldo", 
"Bvencida", "Cvencida", "Dvencida", "vencida", "V2"), class = "data.frame", row.names = c(NA, 
30L))

My problem is when I use merge() and match() functions. 我的问题是当我使用merge()match()函数时。 merge() is more functional than match() related to add new variables by common one but when I use merge() I don't get the same result as match() . merge()的功能要比match()功能更多,后者可以通过通用的方法添加新变量,但是当我使用merge()得到的结果与match()相同。 First I used merge() with a2 and a1 to create DF with the next code: 首先,我使用带有a2a1 merge()来创建带有以下代码的DF

DF=merge(a2,a1,all.x=TRUE)

It added V1 variable from a1 to DF and I got this summary for DF$V1 : 它将a1 V1变量添加到DF ,我得到了DF$V1摘要:

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  1       1       1       1       1       1       9 

After I create a copy of a2 named DF and I made a match with r04_numero_operacion using this code to add V1 variable from a1 to a2 : 创建名为DFa2的副本并使用以下代码与r04_numero_operacion进行匹配后,将a1 V1变量添加到a2

a2$V1<-a1[match(a2$r04_numero_operacion,a1$r04_numero_operacion),"V1"]

It added `V1 to DF but the result is different to the merge() way. 它向DF添加了`V1 ,但结果与merge()方法不同。 I got this summary for DF$V1 in match() solution: 我在match()解决方案中得到了DF$V1摘要:

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  1       1       1       1       1       1       7 

My problem is I want to make the same I made with match() but using merge() function due to this function is more poweful than match() . 我的问题是我想使我与match()相同,但是使用merge()函数由于此函数比match()更强大。 Thanks for your help. 谢谢你的帮助。

In using match(a2$r04_numero_operacion,a1$r04_numero_operacion) the a2$r04_numero_operacion values gets matched the coresponding column in a1 while in using merge(a2,a1,all.x=TRUE) the a1 all the matching columns get matched to the matching column names in a2. 在使用match(a2$r04_numero_operacion,a1$r04_numero_operacion) ,a2 $ r04_numero_operacion值与a1中的coresponding列匹配,而在使用merge(a2,a1,all.x=TRUE) ,所有匹配的a1列都与匹配的列匹配a2中的列名称。 If you only match on the first column, the NA counts match up: 如果仅在第一列匹配,则NA计数匹配:

summary( merge(a2,a1,by=1,all.x=TRUE)$V1 )
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      1       1       1       1       1       1       7 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM