[英]How to fill NA rows by conditions from columns in R
Here is an example:下面是一个例子:
df<-data.frame(v1=rep(1:2, 4),
v2=rep(c("a", "b"), each=4),
v3=paste0(rep(1:2, each=4), rep(c("m", "n", "o", "p"), each=2)),
v4=c(1,2, NA, NA, 3,4, NA,NA),
v5=c(5,6, NA, NA, 7,8, NA,NA),
v6=c(9,10, NA, NA, 11,12, NA,NA))
df
v1 v2 v3 v4 v5 v6
1 1 a 1m 1 5 9
2 2 a 1m 2 6 10
3 1 a 1n NA NA NA
4 2 a 1n NA NA NA
5 1 b 2o 3 7 11
6 2 b 2o 4 8 12
7 1 b 2p NA NA NA
8 2 b 2p NA NA NA
What I wanted is, if column v1
+ v2
+ v3
are same by ignore the last letter of v3
, fill the NAs
from the rows that are not NA
.我想,如果列
v1
+ v2
+ v3
是由忽视的最后一个字母相同的v3
,填补了NAs
来自不在行NA
。 In this case, row3's NA should be filled by row1 due to same 1a1 by ignoring m.在这种情况下,row3 的 NA 应该由 row1 填充,因为相同的 1a1 忽略了 m。 So a desired output would be:
所以期望的输出是:
v1 v2 v3 v4 v5 v6
1 1 a 1m 1 5 9
2 2 a 1m 2 6 10
3 1 a 1n 1 5 9
4 2 a 1n 2 6 10
5 1 b 2o 3 7 11
6 2 b 2o 4 8 12
7 1 b 2p 3 7 11
8 2 b 2p 4 8 12
I don't know but I think this is a simpler way of producing your results我不知道,但我认为这是产生结果的一种更简单的方法
library(tidyverse)
df %>%
group_by(v1,v2) %>%
fill(v4:v6)
df %>%
mutate(v7 = v3 %>% as.character() %>% parse_number()) %>%
group_by(v1,v2,v7) %>%
fill(v4:v6) %>%
select(-v7)
Here is a solution that recodes v3
into a variable that only takes into account the numeric part.这是一个将
v3
重新编码为仅考虑数字部分的变量的解决方案。
library(dplyr)
library(stringr)
#Extract numeric part of the string in v3
df$v7<-str_extract(df$v3,"[[:digit:]]+")
df %>%
group_by(v1,v2,v7) %>%
fill(v4:v6)
Using na.locf
from zoo
使用
zoo
na.locf
library(zoo)
library(data.table)
setDT(df)[, na.locf(.SD),.(v1, v2)]
# v1 v2 v3 v4 v5 v6
#1: 1 a 1m 1 5 9
#2: 1 a 1n 1 5 9
#3: 2 a 1m 2 6 10
#4: 2 a 1n 2 6 10
#5: 1 b 2o 3 7 11
#6: 1 b 2p 3 7 11
#7: 2 b 2o 4 8 12
#8: 2 b 2p 4 8 12
If we want to add the condition in 'v3'如果我们想在'v3'中添加条件
setDT(df)[, names(df)[4:6] := na.locf(.SD),.(v1, v2, sub("\\D+", "", v3))][]
# v1 v2 v3 v4 v5 v6
#1: 1 a 1m 1 5 9
#2: 2 a 1m 2 6 10
#3: 1 a 1n 1 5 9
#4: 2 a 1n 2 6 10
#5: 1 b 2o 3 7 11
#6: 2 b 2o 4 8 12
#7: 1 b 2p 3 7 11
#8: 2 b 2p 4 8 12
Here's a solution using data.table
and zoo
which ignores v3
column's last letter:这是使用
data.table
和zoo
的解决方案,它忽略v3
列的最后一个字母:
library(data.table)
setDT(df)[, match_cols := paste0(v1, v2, substr(v3, 1, nchar(as.character(v3)) - 1))][, id := .GRP, by = match_cols][, v4 := zoo::na.locf(v4, na.rm = F), by = id][, v5 := zoo::na.locf(v5, na.rm = F), by = id][, v6 := zoo::na.locf(v6, na.rm = F), by = id][ , c("match_cols", "id") := NULL]
df
# v1 v2 v3 v4 v5 v6
#1: 1 a 1m 1 5 9
#2: 2 a 1m 2 6 10
#3: 1 a 1n 1 5 9
#4: 2 a 1n 2 6 10
#5: 1 b 2o 3 7 11
#6: 2 b 2o 4 8 12
#7: 1 b 2p 3 7 11
#8: 2 b 2p 4 8 12
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.