简体   繁体   English

使用 dplyr 将缺失的行添加到 R 中的 df 并用 NA 填充

[英]Add rows to a df in R that are missing and fill with NA using dplyr

So I have a df like this所以我有一个这样的df

ID  store        price
1   Walmart      1.00
2   WholeFoods   2.33
3   Footlocker   2.55
4   Denny's      1.09
5   Walgreens    .99
6   CVS          7.00

After some manipulation it becomes经过一些操作它变成

ID  store        price  varA   varB  varC
2   WholeFoods   2.33   D      56    A
3   Footlocker   2.55   TT     302   B
6   CVS          7.00   A      122   C

My goal is I want a DF with all the ID's in it in this new df, just with NA on those new columns, so basically.我的目标是我想要一个包含所有 ID 的 DF 在这个新的 df 中,只是在这些新列上带有 NA,所以基本上。

ID  store        price  varA   varB  varC
1   Walmart      1.00   NA     NA    NA
2   WholeFoods   2.33   D      56    A
3   Footlocker   2.55   TT     302   B
4   Denny's      1.09   NA     NA    NA
5   Walgreens    .99    NA     NA    NA
6   CVS          7.00   A      122   C

You can use dplyr or mabye better base R solution.您可以使用dplyr或更好base R解决方案。

dplyr dplyr

In your particular case it can be done using full_join() from dplyr package:在您的特定情况下,可以使用 dplyr package 中的dplyr full_join()来完成:

a <- read.table(header = T, text = "
ID  store        price
1   Walmart      1.00
2   WholeFoods   2.33
3   Footlocker   2.55
4   Denny's      1.09
5   Walgreens    .99
6   CVS          7.00
")

b <- read.table(header = T, text = "
ID  store        price  varA   varB  varC
2   WholeFoods   2.33   D      56    A
3   Footlocker   2.55   TT     302   B
6   CVS          7.00   A      122   C
")

full_join(a, b)

Result:结果:

  ID      store price varA varB varC
1  1    Walmart  1.00 <NA>   NA <NA>
2  2 WholeFoods  2.33    D   56    A
3  3 Footlocker  2.55   TT  302    B
4  4    Denny's  1.09 <NA>   NA <NA>
5  5  Walgreens  0.99 <NA>   NA <NA>
6  6        CVS  7.00    A  122    C

Base R solution底座R解决方案

It can be easily done with base R 's merge() function:使用base Rmerge() function 可以轻松完成:

merge(a, b, all = TRUE)
#  ID      store price varA varB varC
#1  1    Walmart  1.00 <NA>   NA <NA>
#2  2 WholeFoods  2.33    D   56    A
#3  3 Footlocker  2.55   TT  302    B
#4  4    Denny's  1.09 <NA>   NA <NA>
#5  5  Walgreens  0.99 <NA>   NA <NA>
#6  6        CVS  7.00    A  122    C

which is even faster that dplyr :这比dplyr更快:

Unit: milliseconds
                 expr    min       lq      mean   median       uq     max neval
 merge(a, b, all = T) 1.3881  2.42335  3.259999  2.96615  4.01390  8.9954   100
      full_join(a, b) 6.2017 10.17300 12.653397 12.36170 14.46095 34.0763   100

You can left join the second data frame with the first data frame:您可以将第二个数据框与第一个数据框连接起来:

dplyr::left_join(df1, df2)

This will produce the expected output:这将产生预期的 output:

Joining, by = c("ID", "store", "price")
  ID      store price varA varB varC
1  1    Walmart  1.00 <NA>   NA <NA>
2  2 WholeFoods  2.33    D   56    A
3  3 Footlocker  2.55   TT  302    B
4  4    Denny's  1.09 <NA>   NA <NA>
5  5  Walgreens  0.99 <NA>   NA <NA>
6  6        CVS  7.00    A  122    C

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM