简体   繁体   中英

Add row to dataframe, based on condition (matching ID)

I have two dataframes. I want to add to df1 a row out of df2, based on conditions. The conditions are basically the ID. Imagine you have df1 as so..

    ID   Val1     Val2       
1   1       1        3          
2   2       3        2             
3   3       1        2             
4   4       3        4               
5   5       2        3            

and to that df1 you want to add a specific row out of df2, which looks like this..

    ID      A        B      
1   2       4        9          
2   1       5        7             
3   3       2        6             
4   5       4        9               
5   4       2        8           

So the new dataframe should be automatically matching/merging the df1 and df2, based on ID, and keep in mind, I can't simply use cbind, because the order is different.

    ID   Val1     Val2      A      B     
1   1       1        3      5      7
2   2       3        2      4      9       
3   3       1        2      2      6       
4   4       3        4      4      9          
5   5       2        3      2      8

I have used semi_join, as such

df1 %>% 
  bind_rows(semi_join(df2, df1, by = "ID"))

but I get NA's for df2. The dataframes are connteced by ID, but where the values of A and B should be, there is only NA, without any exception. What could I be doing wrong, is there maybe another better alternative solution to this?

We can use merge which will take care of rows order

merge(df1 , df2 , by = "ID")

  • output
  ID Val1 Val2 A B
1  1    1    3 5 7
2  2    3    2 4 9
3  3    1    2 2 6
4  4    3    4 2 8
5  5    2    3 4 9

Since you're using dplyr for your example, here is a solution using left_join . That is the kind of join in the situation like yours, where you want to add information to df1 from df2 while preserving the format of df1 .

When you use joins, there is no need to use bind_rows , and a semi_join is a so-called filtering join that won't do what you're after here.

library(dplyr)
df1 |> 
  left_join(df2)

Output:

Joining, by = "ID"
# A tibble: 5 × 5
     ID  Val1  Val2     A     B
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     3     5     7
2     2     3     2     4     9
3     3     1     2     2     6
4     4     3     4     2     8
5     5     2     3     4     9

Data:

library(readr)

df1 <- read_table("ID   Val1     Val2
1       1        3          
2       3        2             
3       1        2             
4       3        4               
5       2        3")

df2 <- read_table("ID      A        B
2       4        9          
1       5        7             
3       2        6             
5       4        9               
4       2        8")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM