I have a data table with lots of individuals (id) that have been asked a question (class) n times. Sometimes their answer is 0
or 99
(which are non answer codes for "refused to answer" and "unknown", respectively), however when asked later they do answer the question.
How can I replace the 0
or 99
within an id?
dummy data:
library(data.table)
df <- data.table(
id=rep(1:10,each=4),
class=c(1,1,1,1,1,1,1,99,0,0,0,1,0,2,2,2,99,99,99,
1,3,3,3,0,2,2,0,99,99,99,99,99,1,1,1,1,0,0,0,0))
What I would like to get
res <- data.table(
id=rep(1:10,each=4),
class=c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,1,1,1,1,3,3,3,3,
2,2,2,2,99,99,99,99,1,1,1,1,0,0,0,0))
To visualize the example...
> cbind(df, res = res[, !"id"])
id class res.class
1: 1 1 1
2: 1 1 1
3: 1 1 1
4: 1 1 1
5: 2 1 1
6: 2 1 1
7: 2 1 1
8: 2 99 1
9: 3 0 1
10: 3 0 1
11: 3 0 1
12: 3 1 1
13: 4 0 2
14: 4 2 2
15: 4 2 2
16: 4 2 2
17: 5 99 1
18: 5 99 1
19: 5 99 1
20: 5 1 1
21: 6 3 3
22: 6 3 3
23: 6 3 3
24: 6 0 3
25: 7 2 2
26: 7 2 2
27: 7 0 2
28: 7 99 2
29: 8 99 99
30: 8 99 99
31: 8 99 99
32: 8 99 99
33: 9 1 1
34: 9 1 1
35: 9 1 1
36: 9 1 1
37: 10 0 0
38: 10 0 0
39: 10 0 0
40: 10 0 0
id class res.class
In practice I have ~100,000 individuals that's why I've tagged data.table , though I am open to other (faster) suggestions.
With data.table
, this can also be solved by updating while joining with a lookup table for each id
which replaces all class
values in df
by the corresponding value of the lookup table.
The lookup table is created by
unique(df[!class %in% c(0,99)], by="id")
id class 1: 1 1 2: 2 1 3: 3 1 4: 4 2 5: 5 1 6: 6 3 7: 7 2 8: 9 1
The lookup table contains only entries for id
s with at least one valid answer. In the subsequent update join the other id
s without any valid answer at all are left untouched.
df[unique(df[!class %in% c(0,99)], by="id"), on = "id", class := i.class][]
id class 1: 1 1 2: 1 1 3: 1 1 4: 1 1 5: 2 1 6: 2 1 7: 2 1 8: 2 1 9: 3 1 10: 3 1 11: 3 1 12: 3 1 13: 4 2 14: 4 2 15: 4 2 16: 4 2 17: 5 1 18: 5 1 19: 5 1 20: 5 1 21: 6 3 22: 6 3 23: 6 3 24: 6 3 25: 7 2 26: 7 2 27: 7 2 28: 7 2 29: 8 99 30: 8 99 31: 8 99 32: 8 99 33: 9 1 34: 9 1 35: 9 1 36: 9 1 37: 10 0 38: 10 0 39: 10 0 40: 10 0 id class
# check result
all.equal(df$class, res$class)
[1] TRUE
Here is a simple two-step solution with data.table
.
df[, class2 := min(class[class != 0 & class != 99]), by = id] # take the minimun value per group, excluding 0 and 99
df[, class_final := ifelse(is.infinite(class2), class, class2)] # take original value when is.infinite returns TRUE i.e. group with 0 or 99 only
all(df2$class == df$class_final) # check now
Rcpp solution:
df <- data.table(id=rep(1:10,each=4), class=c(1,1,1,1,1,1,1,99,0,0,0,1,0,2,2,2,99,99,99,1,3,3,3,0,2,2,0,99,99,99,99,99,1,1,1,1,0,0,0,0))
cppFunction('std::vector<int> remap_class(std::vector<int> id, std::vector<int> df_class) {
std::map<int, int> class_remap;
for(int i=1; i<id.size(); i++) {
if(df_class[i] != 0 & df_class[i] != 99) {
class_remap[id[i]] = df_class[i];
}
}
for(int i=1; i<id.size(); i++) {
if(class_remap.count(id[i]) != 0) {
df_class[i] = class_remap[id[i]];
}
}
return(df_class);
}')
df$class <- remap_class(df$id, df$class)
Now check answer is the same.
The answer you posted:
df2 <- data.table(id=rep(1:10,each=4), class=c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,1,1,1,1,3,3,3,3,2,2,2,2,99,99,99,99,1,1,1,1,0,0,0,0))
all(df2$class == df$class)
[1] TRUE
Here's a dplyr
+ tidyr
solution :
library(dplyr) # for mutate, group_by and `%>%`
library(tidyr) # for fill
df1 %>%
mutate(class2 = ifelse(class %in% c(0,99),NA,class)) %>% # we define new column with Nas to be able to use fill
group_by(id) %>%
fill(class2,.direction = "up") %>% # we fill up and down
fill(class2,.direction = "down") %>%
mutate(class2 = ifelse(is.na(class2),class,class2)) # we replace remaining NAs by initial value
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.