I want to fill NA value to my dataset. I am not sure if it is possible to do as following or not:
I have 3 columns, I want to fill NA of distance out
duration distance mode
15 7 car
20 6 walk
13 NA car
20 8 car
18 NA walk
30 10 walk
for each mode I want to find closest duration and put in NA for distance
for mode car , the closest duration to 13 is 15 so first NA is 7, for second NA (which is walk mode), the closest duration to 18 is 20 so NA is 6.
Here's a data.table
solution:
library(data.table)
dt[is.na(distance),
distance := {dt[!is.na(distance)
][.SD,
on = .(mode),
distance[which.min(abs(duration - i.duration))],
by = .EACHI]$V1
}
]
dt
# duration distance mode
#1: 15 7 car
#2: 20 6 walk
#3: 13 7 car
#4: 20 8 car
#5: 18 6 walk
#6: 30 10 walk
#7: 35 10 walk
It:
na
values non_NA
values based on the mode
of transportation. Data:
library(data.table)
DT <- fread('duration distance mode
15 7 car
20 6 walk
13 NA car
20 8 car
18 NA walk
30 10 walk
35 NA walk')
A way in base R could be to separate NA
and non-NA groups. For every value in NA_group
we find the closest duration
in non_NA_group
in same mode
and return the corresponding distance
.
NA_group <- subset(df, is.na(distance))
non_NA_group <- subset(df, !is.na(distance))
df$distance[is.na(df$distance)] <- mapply(function(x, y) {
temp <- subset(non_NA_group, mode == y)
temp$distance[which.min(abs(x - temp$duration))]
} ,NA_group$duration, NA_group$mode)
df
# duration distance mode
#1 15 7 car
#2 20 6 walk
#3 13 7 car
#4 20 8 car
#5 18 6 walk
#6 30 10 walk
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.