简体   繁体   English

如何根据R中列的条件填充NA行

[英]How to fill NA rows by conditions from columns in R

Here is an example:下面是一个例子:

df<-data.frame(v1=rep(1:2, 4), 
               v2=rep(c("a", "b"), each=4), 
               v3=paste0(rep(1:2, each=4), rep(c("m", "n", "o", "p"), each=2)), 
               v4=c(1,2, NA, NA, 3,4, NA,NA),
               v5=c(5,6, NA, NA, 7,8, NA,NA),
               v6=c(9,10, NA, NA, 11,12, NA,NA))

df
  v1 v2 v3 v4 v5 v6
1  1  a 1m  1  5  9
2  2  a 1m  2  6 10
3  1  a 1n NA NA NA
4  2  a 1n NA NA NA
5  1  b 2o  3  7 11
6  2  b 2o  4  8 12
7  1  b 2p NA NA NA
8  2  b 2p NA NA NA

What I wanted is, if column v1 + v2 + v3 are same by ignore the last letter of v3 , fill the NAs from the rows that are not NA .我想,如果列v1 + v2 + v3是由忽视的最后一个字母相同的v3 ,填补了NAs来自不在行NA In this case, row3's NA should be filled by row1 due to same 1a1 by ignoring m.在这种情况下,row3 的 NA 应该由 row1 填充,因为相同的 1a1 忽略了 m。 So a desired output would be:所以期望的输出是:

  v1 v2 v3 v4 v5 v6
1  1  a 1m  1  5  9
2  2  a 1m  2  6 10
3  1  a 1n  1  5  9
4  2  a 1n  2  6 10
5  1  b 2o  3  7 11
6  2  b 2o  4  8 12
7  1  b 2p  3  7 11
8  2  b 2p  4  8 12

I don't know but I think this is a simpler way of producing your results我不知道,但我认为这是产生结果的一种更简单的方法

library(tidyverse)
df %>% 
  group_by(v1,v2) %>% 
  fill(v4:v6)

Adding the v3 logic添加 v3 逻辑

df %>%
  mutate(v7 = v3 %>% as.character() %>%  parse_number()) %>% 
  group_by(v1,v2,v7) %>% 
  fill(v4:v6) %>% 
  select(-v7)

Here is a solution that recodes v3 into a variable that only takes into account the numeric part.这是一个将v3重新编码为仅考虑数字部分的变量的解决方案。

library(dplyr)
library(stringr)

#Extract numeric part of the string in v3
df$v7<-str_extract(df$v3,"[[:digit:]]+")

df %>%
  group_by(v1,v2,v7) %>% 
  fill(v4:v6)

Using na.locf from zoo使用zoo na.locf

library(zoo)
library(data.table)
setDT(df)[, na.locf(.SD),.(v1, v2)]
#    v1 v2 v3 v4 v5 v6
#1:  1  a 1m  1  5  9
#2:  1  a 1n  1  5  9
#3:  2  a 1m  2  6 10
#4:  2  a 1n  2  6 10
#5:  1  b 2o  3  7 11
#6:  1  b 2p  3  7 11
#7:  2  b 2o  4  8 12
#8:  2  b 2p  4  8 12

If we want to add the condition in 'v3'如果我们想在'v3'中添加条件

setDT(df)[, names(df)[4:6] := na.locf(.SD),.(v1, v2, sub("\\D+", "", v3))][]
#   v1 v2 v3 v4 v5 v6
#1:  1  a 1m  1  5  9
#2:  2  a 1m  2  6 10
#3:  1  a 1n  1  5  9
#4:  2  a 1n  2  6 10
#5:  1  b 2o  3  7 11
#6:  2  b 2o  4  8 12
#7:  1  b 2p  3  7 11
#8:  2  b 2p  4  8 12

Here's a solution using data.table and zoo which ignores v3 column's last letter:这是使用data.tablezoo的解决方案,它忽略v3列的最后一个字母:

library(data.table)
setDT(df)[, match_cols := paste0(v1, v2, substr(v3, 1, nchar(as.character(v3)) - 1))][, id := .GRP, by = match_cols][, v4 := zoo::na.locf(v4, na.rm = F), by = id][, v5 := zoo::na.locf(v5, na.rm = F), by = id][, v6 := zoo::na.locf(v6, na.rm = F), by = id][ , c("match_cols", "id") := NULL]
df

#    v1 v2 v3 v4 v5 v6
#1:  1  a 1m  1  5  9
#2:  2  a 1m  2  6 10
#3:  1  a 1n  1  5  9
#4:  2  a 1n  2  6 10
#5:  1  b 2o  3  7 11
#6:  2  b 2o  4  8 12
#7:  1  b 2p  3  7 11
#8:  2  b 2p  4  8 12

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM