简体   繁体   English

R 用单独的列中的非 NA 值覆盖列值 dataframe

[英]R Overwrite column values with non NA values from column in separate dataframe

I have a dataframe 'df1' with a lot of columns, but the ones of interest are:我有一个 dataframe 'df1' 有很多列,但感兴趣的是:

Number数字 Code代码
1 1个
2 2个
3 3个
10 10
11 11 AMRO AMRO
4 4个
277 277
2100 2100 BLPH BLPH

And I have another dataframe 'df2' with a lot of columns, but the ones of interest are:我还有另一个 dataframe 'df2' 有很多列,但感兴趣的是:

Number数字 Code代码
1 1个 AMCR AMCR
2 2个 AMCR AMCR
3 3个 BANO巴诺
10 10 BAEA BAEA
12 12 AMRO AMRO
4 4个 NA北美
277 277 NA北美
2100 2100 NA北美

I want matching values in the 'Number' columns of 'df1' and 'df2' to lead to values in the 'Code' column in 'df2' to overwrite the 'Code' values in 'df1' as long as the 'Code' values in 'df2' don't contain an NA, so that the final result of 'df1' looks like:我希望“df1”和“df2”的“数字”列中的匹配值导致“df2”中“代码”列中的值覆盖“df1”中的“代码”值,只要“代码” 'df2' 中的值不包含 NA,因此 'df1' 的最终结果如下所示:

Number数字 Code代码
1 1个 AMCR AMCR
2 2个 AMCR AMCR
3 3个 BANO巴诺
10 10 BAEA BAEA
11 11 AMRO AMRO
4 4个
277 277
2100 2100 BLPH BLPH

Thank you for your help!谢谢您的帮助!

We can do我们可以做的

library(powerjoin)
power_left_join(df1, df2, by = "Number", conflict = coalesce)

-output -输出

Number Code
1      1 AMCR
2      2 AMCR
3      3 BANO
4     10 BAEA
5     11 AMRO
6      4 <NA>
7    277 <NA>
8   2100 BLPH

Or to do an overwrite, use data.table或者进行覆盖,使用data.table

library(data.table)
setDT(df1)[df2, Code := fcoalesce(Code, i.Code), on = .(Number)]

-output -输出

> df1
   Number   Code
    <int> <char>
1:      1   AMCR
2:      2   AMCR
3:      3   BANO
4:     10   BAEA
5:     11   AMRO
6:      4   <NA>
7:    277   <NA>
8:   2100   BLPH

data数据

df1 <- structure(list(Number = c(1L, 2L, 3L, 10L, 11L, 4L, 277L, 2100L
), Code = c(NA, NA, NA, NA, "AMRO", NA, NA, "BLPH")), 
class = "data.frame", row.names = c(NA, 
-8L))

df2 <- structure(list(Number = c(1L, 2L, 3L, 10L, 12L, 4L, 277L, 2100L
), Code = c("AMCR", "AMCR", "BANO", "BAEA", "AMRO", NA, NA, NA
)), class = "data.frame", row.names = c(NA, -8L))

Here is an alternative approach using bind_cols :这是使用bind_cols的替代方法:

library(dplyr)

bind_cols(df1, df2) %>% 
  mutate(Code = coalesce(Code...2, Code...4)) %>% 
  select(Number = Number...1, Code)

 Number Code
1      1 AMCR
2      2 AMCR
3      3 BANO
4     10 BAEA
5     11 AMRO
6      4 <NA>
7    277 <NA>
8   2100 BLPH

Here is a solution playing with dplyr full_join and inner_join这是一个使用dplyr full_joininner_join的解决方案

library(dplyr)
df1 %>% 
  full_join(df2) %>% na.omit() %>% 
  full_join(df1 %>% inner_join(df2)) %>% 
  filter(Number %in% df1$Number) %>%
  arrange(Number)

Output Output


#>   Number Code
#> 1      1 AMCR
#> 2      2 AMCR
#> 3      3 BANO
#> 4      4 <NA>
#> 5     10 BAEA
#> 6     11 AMRO
#> 7    277 <NA>
#> 8   2100 BLPH

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM