简体   繁体   English

根据R中的条件合并2个数据帧

[英]Merge 2 dataframes based on condition in R

I have the following 2 data frames that I want to merge: 我有以下两个要合并的数据框:

x <- data.frame(a= 1:11, b =3:13, c=2:12, d=7:17, invoice = 1:11)
x =
   a  b  c  d invoice
   1  3  2  7       1
   2  4  3  8       2
   3  5  4  9       3
   4  6  5 10       4
   5  7  6 11       5
   6  8  7 12       6
   7  9  8 13       7
   8 10  9 14       8
   9 11 10 15       9
  10 12 11 16      10
  11 13 12 17      11


y <- data.frame(nr = 100:125, invoice = 1)
y$invoice[12:26] <- 2

> y
    nr invoice
   100       1
   101       1
   102       1
   103       1
   104       1
   105       1
   106       1
   107       1
   108       1
   109       1
   110       1
   111       2
   112       2
   113       2
   114       2
   115       2
   116       2
   117       2

I want to merge the letters from dataframe X with dataframe Y when the invoice number is the same. 我想在发票编号相同时将数据框X中的字母与数据框Y中的字母合并。 It should start with merging the value from letter A, then B ect. 它应该首先合并字母A中的值,然后再合并B ect中的值。 This should be happening until the invoice number is not the same anymore and then choose the numbers from invoice nr 2. 这应该一直发生,直到发票编号不再相同为止,然后从发票编号2中选择编号。

the output should be like this: 输出应该是这样的:

> output
    nr invoice  letter_count
   100       1    1
   101       1    3
   102       1    2
   103       1    7
   104       1    1
   105       1    3
   106       1    2
   107       1    7
   108       1    1
   109       1    2
   110       1    7
   111       2    2
   112       2    4
   113       2    3
   114       2    8
   115       2    2
   116       2    4

I tried to use the merge function with the by argument but this created an error that the number of rows is not the same. 我尝试将merge功能与by参数一起使用by但这会产生一个错误,即行数不相同。 Any help I will appreciate. 任何帮助,我将不胜感激。

Here is a solution using the purrr package. 这是使用purrr软件包的解决方案。

# Prepare the data frames
x <- data.frame(a = 1:11, b = 3:13, c = 2:12, d = 7:17, invoice = 1:11)
y <- data.frame(nr = 100:125, invoice = 1)
y$invoice[12:26] <- 2

# Load package
library(purrr)

# Split the data based on invoice
y_list <- split(y, f = y$invoice)

# Design a function to transfer data
trans_fun <- function(main_df, letter_df = x){

  # Get the invoice number
  temp_num<- unique(main_df$invoice)
  # Extract letter_count information from x
  add_vec <- unlist(letter_df[letter_df$invoice == temp_num, 1:4])
  # Get the remainder of nrow(main_df) and length(add_vec)
  reamin_num <- nrow(main_df) %% length(add_vec)
  # Get the multiple of nrow(main_df) and length(add_vec)
  multiple_num <- nrow(main_df) %/% length(add_vec)
  # Create the entire sequence to add
  add_seq <- rep(add_vec, multiple_num + 1)
  add_seq2 <- add_seq[1:(length(add_seq) - (length(add_vec) - reamin_num))]
  # Add new column, add_seq2, to main_df
  main_df$letter_count <- add_seq2

  return(main_df)
}

# Apply the trans_fun function using map_df 
output <- map_df(y_list, .f = trans_fun)

# See the result
output
    nr invoice letter_count
1  100       1            1
2  101       1            3
3  102       1            2
4  103       1            7
5  104       1            1
6  105       1            3
7  106       1            2
8  107       1            7
9  108       1            1
10 109       1            3
11 110       1            2
12 111       2            2
13 112       2            4
14 113       2            3
15 114       2            8
16 115       2            2
17 116       2            4
18 117       2            3
19 118       2            8
20 119       2            2
21 120       2            4
22 121       2            3
23 122       2            8
24 123       2            2
25 124       2            4
26 125       2            3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM