[英]Merge 2 dataframes based on condition in R
I have the following 2 data frames that I want to merge: 我有以下两个要合并的数据框:
x <- data.frame(a= 1:11, b =3:13, c=2:12, d=7:17, invoice = 1:11)
x =
a b c d invoice
1 3 2 7 1
2 4 3 8 2
3 5 4 9 3
4 6 5 10 4
5 7 6 11 5
6 8 7 12 6
7 9 8 13 7
8 10 9 14 8
9 11 10 15 9
10 12 11 16 10
11 13 12 17 11
y <- data.frame(nr = 100:125, invoice = 1)
y$invoice[12:26] <- 2
> y
nr invoice
100 1
101 1
102 1
103 1
104 1
105 1
106 1
107 1
108 1
109 1
110 1
111 2
112 2
113 2
114 2
115 2
116 2
117 2
I want to merge the letters from dataframe X with dataframe Y when the invoice number is the same. 我想在发票编号相同时将数据框X中的字母与数据框Y中的字母合并。 It should start with merging the value from letter A, then B ect.
它应该首先合并字母A中的值,然后再合并B ect中的值。 This should be happening until the invoice number is not the same anymore and then choose the numbers from invoice nr 2.
这应该一直发生,直到发票编号不再相同为止,然后从发票编号2中选择编号。
the output should be like this: 输出应该是这样的:
> output
nr invoice letter_count
100 1 1
101 1 3
102 1 2
103 1 7
104 1 1
105 1 3
106 1 2
107 1 7
108 1 1
109 1 2
110 1 7
111 2 2
112 2 4
113 2 3
114 2 8
115 2 2
116 2 4
I tried to use the merge
function with the by
argument but this created an error that the number of rows is not the same. 我尝试将
merge
功能与by
参数一起使用by
但这会产生一个错误,即行数不相同。 Any help I will appreciate. 任何帮助,我将不胜感激。
Here is a solution using the purrr
package. 这是使用
purrr
软件包的解决方案。
# Prepare the data frames
x <- data.frame(a = 1:11, b = 3:13, c = 2:12, d = 7:17, invoice = 1:11)
y <- data.frame(nr = 100:125, invoice = 1)
y$invoice[12:26] <- 2
# Load package
library(purrr)
# Split the data based on invoice
y_list <- split(y, f = y$invoice)
# Design a function to transfer data
trans_fun <- function(main_df, letter_df = x){
# Get the invoice number
temp_num<- unique(main_df$invoice)
# Extract letter_count information from x
add_vec <- unlist(letter_df[letter_df$invoice == temp_num, 1:4])
# Get the remainder of nrow(main_df) and length(add_vec)
reamin_num <- nrow(main_df) %% length(add_vec)
# Get the multiple of nrow(main_df) and length(add_vec)
multiple_num <- nrow(main_df) %/% length(add_vec)
# Create the entire sequence to add
add_seq <- rep(add_vec, multiple_num + 1)
add_seq2 <- add_seq[1:(length(add_seq) - (length(add_vec) - reamin_num))]
# Add new column, add_seq2, to main_df
main_df$letter_count <- add_seq2
return(main_df)
}
# Apply the trans_fun function using map_df
output <- map_df(y_list, .f = trans_fun)
# See the result
output
nr invoice letter_count
1 100 1 1
2 101 1 3
3 102 1 2
4 103 1 7
5 104 1 1
6 105 1 3
7 106 1 2
8 107 1 7
9 108 1 1
10 109 1 3
11 110 1 2
12 111 2 2
13 112 2 4
14 113 2 3
15 114 2 8
16 115 2 2
17 116 2 4
18 117 2 3
19 118 2 8
20 119 2 2
21 120 2 4
22 121 2 3
23 122 2 8
24 123 2 2
25 124 2 4
26 125 2 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.