Say I have a data.table like this:
set.seed(10)
data.table(group = rep(c("a","b","c"), each=5), date = rep(1:5,3), value = sample(c(95:105,""),15, replace=TRUE))
Within each group, in the value column, I would like to check (in a simple whay) whether there is a ""(empty character), or a group of empty characters, that is both preceded and followed by a value.
So, this is fine: "", 95,103, etc.... (empty character is first within the group), but the patterns below are examples"missing data" that I would like to detect:
95, "", 103,... (empty character in the middle)
95, "","", 103... (several empty characters in the middle)
95, 103, "" (empty character in the end)
So, in the output below, I would be able to get the row/group A, and if there are many groups, I should get all groups (or rows)
group date value
1: a 1 105
2: a 2 103
3: a 3 104
4: a 4
5: a 5 101
6: b 1 102
7: b 2 100
8: b 3 101
9: b 4 97
10: b 5 102
11: c 1 104
12: c 2 101
13: c 3 104
14: c 4 96
15: c 5 102
Edit: What I would need do is to select the rows that have the wrong pattern (so empty string(s) in the middle or in the end) , in order to be able to detect whether there are any errors in a large dataset. So in the table in my example, the desired output would be the 4th row as it has a "missing value" (an empty character inbetween values)
group date value
1: a 4
(If there were more unwanted rows, of course, I would like to get all of them)
In case your data.table is not sorted according to 'date' column you can use the following:
DT[order(date), order := c(1:.N) , group]
DT[value == "" & order > 1L]
output:
group date value order
1: a 4 4
data is the same as yours:
set.seed(10)
DT <- data.table(group = rep(c("a","b","c"), each=5), date = rep(1:5,3),
value = sample(c(95:105,""),15, replace=TRUE))
Here is an option:
DT[, rw := rleid(value==""), group]
DT[value=="" & rw>1L]
output:
group date value rw
1: a 4 2
data:
library(data.table)
set.seed(10)
DT <- data.table(group = rep(c("a","b","c","d"), each=5),
date = rep(1:5,4), value = c(sample(c(95:105,""),15, replace=TRUE), c("",2,3,4,5)))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.