[英]Differences between merge and match functions in R
我每個人都刪除我的最新帖子,以重現我的問題。 我正在處理數據幀a1
(dput結構)旁邊的內容:
structure(list(r04_numero_operacion = c("0050475725", "0050490602",
"0050491033", "0050496386", "0050518985", "0050630090", "0050631615",
"0060235906", "0060238732", "0060241333", "0060244391", "0060245813",
"0060260056", "0060266356", "0800041441", "0800054041", "0800055382",
"0800058554", "2020200062", "2020200073", "CAR1010001706000",
"CAR1010001795000", "CAR1010001803000", "CAR1010001871000", "CAR1010001962000",
"CAR1010002002000", "CAR1010002120000", "CAR1010002189000", "CAR1010002215000",
"CAR1010002250000"), perdida3 = c(523.12, 265.43, 8371.66, 5242.13,
4960.51, 8473.27, 3743.45, 1283.32, 2229.25, 8001.27, 8653.94,
3670.13, 4536.02, 8216.55, 2481.36, 288.94, 1637.28, 4566.89,
1573.63, 11217.92, 0, 0, 0, 0, 0, 0, 0, 0, 9633.9, 0), Saldo = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28, 4566.89,
1, 1, 481.59, 299.52, 258.13, 603.84, 231.61, 631.68, 220.6,
210.54, 1, 1224.44), Bvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 603.84, 0, 631.68,
0, 0, 0, 0), Cvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1224.44),
Dvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), vencida = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28,
4566.89, 1, 1, 0, 0, 0, 603.84, 0, 631.68, 0, 0, 1, 1224.44
), V1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("r04_numero_operacion",
"perdida3", "Saldo", "Bvencida", "Cvencida", "Dvencida", "vencida",
"V1"), codepage = 1252L, row.names = c(NA, 30L), class = "data.frame")
和a2
數據幀(dput結構):
structure(list(r04_numero_operacion = c("0050475725", "0050490602",
"0050491033", "0050496386", "0050518985", "0050630090", "0050631615",
"0060235906", "0060238732", "0060241333", "0060244391", "0060245813",
"0060260056", "0060266356", "0800041441", "0800054041", "0800055382",
"0800058554", "2020200073", "CAR1010002002000", "CAR1010002189000",
"CAR1010002215000", "CAR1010002250000", "CAR1010002264000", "CAR1010002297000",
"CAR1010002401000", "CAR1010002412000", "CAR1010002436000", "CAR1010002529000",
"CAR1010002709000"), perdida3 = c(523.12, 265.43, 8371.66, 5242.13,
4960.51, 8473.27, 3743.45, 1283.32, 2229.25, 8001.27, 8653.94,
3670.13, 4536.02, 8216.55, 2481.36, 288.94, 1637.28, 4566.89,
11217.92, 0, 0, 9633.9, 0, 0, 0, 0, 0, 0, 0, 0), Saldo = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28, 4566.89,
1, 317.72, 210.54, 1, 868.93, 242.91, 298.78, 120.63, 255.01,
357.68, 284.08, 308.83), Bvencida = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 317.72, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0), Cvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 868.93, 0, 0, 0, 0, 0, 0, 0), Dvencida = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), vencida = c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28, 4566.89, 1, 317.72, 0,
1, 868.93, 0, 0, 0, 0, 0, 0, 0), V2 = c(2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2)), .Names = c("r04_numero_operacion", "perdida3", "Saldo",
"Bvencida", "Cvencida", "Dvencida", "vencida", "V2"), class = "data.frame", row.names = c(NA,
30L))
我的問題是當我使用merge()
和match()
函數時。 merge()
的功能要比match()
功能更多,后者可以通過通用的方法添加新變量,但是當我使用merge()
得到的結果與match()
相同。 首先,我使用帶有a2
和a1
merge()
來創建帶有以下代碼的DF
:
DF=merge(a2,a1,all.x=TRUE)
它將a1
V1
變量添加到DF
,我得到了DF$V1
摘要:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1 1 1 1 1 1 9
創建名為DF
的a2
的副本並使用以下代碼與r04_numero_operacion
進行匹配后,將a1
V1
變量添加到a2
:
a2$V1<-a1[match(a2$r04_numero_operacion,a1$r04_numero_operacion),"V1"]
它向DF
添加了`V1
,但結果與merge()
方法不同。 我在match()
解決方案中得到了DF$V1
摘要:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1 1 1 1 1 1 7
我的問題是我想使我與match()
相同,但是使用merge()
函數由於此函數比match()
更強大。 謝謝你的幫助。
在使用match(a2$r04_numero_operacion,a1$r04_numero_operacion)
,a2 $ r04_numero_operacion值與a1中的coresponding列匹配,而在使用merge(a2,a1,all.x=TRUE)
,所有匹配的a1列都與匹配的列匹配a2中的列名稱。 如果僅在第一列匹配,則NA計數匹配:
summary( merge(a2,a1,by=1,all.x=TRUE)$V1 )
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1 1 1 1 1 1 7
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.