[英]how to fill NA with 2 columns
I want to fill NA value to my dataset. 我想将NA值填充到我的数据集中。 I am not sure if it is possible to do as following or not: 我不确定是否可以执行以下操作:
I have 3 columns, I want to fill NA of distance out 我有3列,我想填写距离NA
duration distance mode
15 7 car
20 6 walk
13 NA car
20 8 car
18 NA walk
30 10 walk
for each mode I want to find closest duration and put in NA for distance 对于每种模式,我想找到最接近的持续时间并输入NA作为距离
for mode car , the closest duration to 13 is 15 so first NA is 7, for second NA (which is walk mode), the closest duration to 18 is 20 so NA is 6. 对于模式汽车,最接近的持续时间为15,因此第一个NA为7,对于第二个NA(即步行模式),最接近的持续时间为20,因此NA为6。
Here's a data.table
solution: 这是一个data.table
解决方案:
library(data.table)
dt[is.na(distance),
distance := {dt[!is.na(distance)
][.SD,
on = .(mode),
distance[which.min(abs(duration - i.duration))],
by = .EACHI]$V1
}
]
dt
# duration distance mode
#1: 15 7 car
#2: 20 6 walk
#3: 13 7 car
#4: 20 8 car
#5: 18 6 walk
#6: 30 10 walk
#7: 35 10 walk
It: 它:
na
values 将数据框子集设置为仅允许na
值 non_NA
values based on the mode
of transportation. 根据运输mode
与唯一的non_NA
值进行自我联接。 Data: 数据:
library(data.table)
DT <- fread('duration distance mode
15 7 car
20 6 walk
13 NA car
20 8 car
18 NA walk
30 10 walk
35 NA walk')
A way in base R could be to separate NA
and non-NA groups. 基数R中的一种方法可能是将NA
和非NA组分开。 For every value in NA_group
we find the closest duration
in non_NA_group
in same mode
and return the corresponding distance
. 对于NA_group
每个值,我们在相同mode
找到non_NA_group
中最接近的duration
,并返回相应的distance
。
NA_group <- subset(df, is.na(distance))
non_NA_group <- subset(df, !is.na(distance))
df$distance[is.na(df$distance)] <- mapply(function(x, y) {
temp <- subset(non_NA_group, mode == y)
temp$distance[which.min(abs(x - temp$duration))]
} ,NA_group$duration, NA_group$mode)
df
# duration distance mode
#1 15 7 car
#2 20 6 walk
#3 13 7 car
#4 20 8 car
#5 18 6 walk
#6 30 10 walk
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.