[英]How can i merge two data frames in R by the same values in different columns using base or dplyr?
Let's say that I have two data frames In R :假设我在 R 中有两个数据框:
The fIrst one:第一个:
cat = rep("xx",5);cat
fIrst = c("a","a","a","h","h")
second = c("b","c","d","b","c")
val1 = c(1,2,3,10,20)
A = tIbble(cat,fIrst,second,val1);A
that looks lIke thIs :看起来像这样:
# A tIbble: 5 x 4
cat fIrst second val1
<chr> <chr> <chr> <dbl>
1 xx a b 1
2 xx a c 2
3 xx a d 3
4 xx h b 10
5 xx h c 20
and a second one :第二个:
cat = rep("xx",5);cat
fIrst = c("a","a","a","b","c")
second = c("b","c","d","h","h")
val2 = c(100,200,300,400,500)
B = tIbble(cat,fIrst,second,val2);B
that Is :那是 :
# A tIbble: 5 x 4
cat fIrst second val2
<chr> <chr> <chr> <dbl>
1 xx a b 100
2 xx a c 200
3 xx a d 300
4 xx b h 400
5 xx c h 500
I want to merge them but when I do It I get :我想合并它们,但是当我这样做时,我得到:
left_joIn(A,B,by=c("cat","fIrst","second"))
# A tIbble: 5 x 5
cat fIrst second val1 val2
<chr> <chr> <chr> <dbl> <dbl>
1 xx a b 1 100
2 xx a c 2 200
3 xx a d 3 300
4 xx h b 10 NA
5 xx h c 20 NA
because h Is In dIfferent column when I try to joIn them.因为当我尝试加入它们时,h 在不同的列中。 Because the combInatIons are the same but In dIfferent order.
因为组合相同但顺序不同。
Ideally I want to look lIke thIs :理想情况下,我想看起来像这样:
cat![]() |
fIrst![]() |
second![]() |
val1 ![]() |
val2 ![]() |
---|---|---|---|---|
xx ![]() |
a![]() |
b ![]() |
1 ![]() |
100 ![]() |
xx ![]() |
a![]() |
c ![]() |
2 ![]() |
200 ![]() |
xx ![]() |
a![]() |
d ![]() |
3 ![]() |
300 ![]() |
xx ![]() |
h ![]() |
b ![]() |
10 ![]() |
400 ![]() |
xx ![]() |
h ![]() |
c ![]() |
20 ![]() |
500 ![]() |
how can I do that ?我怎样才能做到这一点 ? Any help?
有什么帮助吗?
You can do this simply by binding the val2
column to the first dataframe (tibble) using base R's cbind
function.您可以通过使用基本 R 的
cbind
函数将val2
列绑定到第一个数据帧(tibble)来简单地做到这一点。
cbind(A, B$val2)
would do the job. cbind(A, B$val2)
可以完成这项工作。 You are having problems because h, b & h, c
pairs are absent in the tibble B
but you are expecting R to superimpose the values of B$val2[c(4,5)]
in place where the first
& second
keys conflict in the tibbles A & B. You cannot expect R to magically bind data where there are conflicting keys.您遇到问题是因为 tibble
B
中不存在h, b & h, c
对,但您希望 R 将B$val2[c(4,5)]
的值叠加在first
和second
键冲突的地方小标题 A 和 B。你不能指望 R 神奇地绑定存在冲突键的数据。 If you want to keep all keys you can use the dplyr::full_join()
but it'll introduce NA
s where there are conflicting keys.如果要保留所有密钥,可以使用
dplyr::full_join()
但它会在密钥冲突的地方引入NA
。 (By keys I mean the values you pass for the by
argument!) (按键是指您为
by
参数传递的值!)
And it's a frowned upon practice to use equal sign =
for assignments in R. Try using ->
instead.在 R 中使用等号
=
进行赋值是一种不习惯的做法。尝试使用->
代替。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.