简体   繁体   English

如何用2列填充NA

[英]how to fill NA with 2 columns

I want to fill NA value to my dataset. 我想将NA值填充到我的数据集中。 I am not sure if it is possible to do as following or not: 我不确定是否可以执行以下操作:

I have 3 columns, I want to fill NA of distance out 我有3列,我想填写距离NA

         duration    distance       mode
           15            7            car
            20           6             walk
           13            NA             car
            20           8             car
            18           NA            walk
           30           10            walk

for each mode I want to find closest duration and put in NA for distance 对于每种模式,我想找到最接近的持续时间并输入NA作为距离

for mode car , the closest duration to 13 is 15 so first NA is 7, for second NA (which is walk mode), the closest duration to 18 is 20 so NA is 6. 对于模式汽车,最接近的持续时间为15,因此第一个NA为7,对于第二个NA(即步行模式),最接近的持续时间为20,因此NA为6。

Here's a data.table solution: 这是一个data.table解决方案:

library(data.table)

dt[is.na(distance),
   distance := {dt[!is.na(distance)
                   ][.SD,
                     on = .(mode),
                     distance[which.min(abs(duration - i.duration))],
                     by = .EACHI]$V1
     }
   ]

dt

#   duration distance mode
#1:       15        7  car
#2:       20        6 walk
#3:       13        7  car
#4:       20        8  car
#5:       18        6 walk
#6:       30       10 walk
#7:       35       10 walk

It: 它:

  1. Subsets the dataframe to only allow na values 将数据框子集设置为仅允许na
  2. Self-joins with the only non_NA values based on the mode of transportation. 根据运输mode与唯一的non_NA值进行自我联接。
  3. Determines which is the minimum distance. 确定哪个是最小距离。

Data: 数据:

library(data.table)
DT <-          fread('duration    distance       mode
15            7            car
20           6             walk
13            NA             car
20           8             car
18           NA            walk
30           10            walk
35            NA            walk')

A way in base R could be to separate NA and non-NA groups. 基数R中的一种方法可能是将NA和非NA组分开。 For every value in NA_group we find the closest duration in non_NA_group in same mode and return the corresponding distance . 对于NA_group每个值,我们在相同mode找到non_NA_group中最接近的duration ,并返回相应的distance

NA_group <- subset(df, is.na(distance))
non_NA_group <- subset(df, !is.na(distance))

df$distance[is.na(df$distance)] <- mapply(function(x, y) {
    temp <- subset(non_NA_group, mode == y)
    temp$distance[which.min(abs(x - temp$duration))]
} ,NA_group$duration, NA_group$mode)

df
#  duration distance mode
#1       15        7  car
#2       20        6 walk
#3       13        7  car
#4       20        8  car
#5       18        6 walk
#6       30       10 walk

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM