[英]R - clean up data based on preceding and following values
I have got a table which is later on divided into multiple intervals based on multiple conditions.我有一张表格,后来根据多个条件分为多个间隔。 In some rare cases, I one or multiple rows which do not fall into the defined interval, so I'd like to preform some extra clean-up in the data.在极少数情况下,我的一行或多行不属于定义的间隔,因此我想对数据进行一些额外的清理。
For each group (name, location), if the row value in stop == 0, I need to count how many of those rows are in the interval.对于每个组(名称,位置),如果停止中的行值 == 0,我需要计算这些行中有多少在间隔中。 If that less then <3, I need to check how many continous rows are market as stop == 1 above and below the interval with zero value.如果小于<3,我需要检查有多少连续行是市场作为停止== 1 高于和低于具有零值的区间。 If the count of values with stop == 1 above & below == 1 then I need to change values in the intervals with zero to 1.如果停止 == 1 以上和以下 == 1 的值计数,那么我需要将间隔中的值更改为 0 到 1。
I hope the picture will make it more clear:我希望图片能更清楚:
df <- read.table(text="name location stop
John London 1
John London 1
John London 1
John London 1
John London 1
John London 1
John London 1
John London 0
John London 0
John London 1
John London 1
John London 1
John London 1
John London 1
John London 1
John London 0
John New_York 0
John New_York 0
John New_York 0
John New_York 1
John New_York 0
",header = TRUE, stringsAsFactors = FALSE)
You could iterate over the rows, but it seems that all you want to do is replace all instances of 101
with 111
and 1001
with 1111
in stop
.您可以遍历行,但似乎您想要做的只是将101
的所有实例替换为111
,并将1001
的所有实例替换为stop
中的1111
。 You can do this by turning the stop
column to string and then make substitutions using gsub()
:您可以通过将stop
列转换为字符串然后使用gsub()
进行替换来做到这一点:
stopString = paste0(df$stop, collapse = "")
stopString = gsub("101","111",stopString)
stopString = gsub("1001","1111",stopString)
df$stop = as.numeric(unlist(strsplit(stopString,"")))
> df
name location stop
1 John London 1
2 John London 1
3 John London 1
4 John London 1
5 John London 1
6 John London 1
7 John London 1
8 John London 1
9 John London 1
10 John London 1
11 John London 1
12 John London 1
13 John London 1
14 John London 1
15 John London 1
16 John London 0
17 John New_York 0
18 John New_York 0
19 John New_York 0
20 John New_York 1
21 John New_York 0
Edit: grouping by name and location:编辑:按名称和位置分组:
df <- read.table(text="name location stop
John London 1
John London 0
John London 1
John New_York 0
John New_York 1
John New_York 0
John New_York 0
John New_York 0
John New_York 1
John New_York 0
",header = TRUE, stringsAsFactors = TRUE)
f <- function(x)
{
stopString = paste0(x, collapse = "")
stopString = gsub("101","111",stopString)
stopString = gsub("1001","1111",stopString)
as.numeric(unlist(strsplit(stopString,"")))
}
> df %>% dplyr::group_by(name, location) %>%
dplyr::summarise(stop=stop, s=f(stop))
# A tibble: 10 x 4
# Groups: name, location [2]
name location stop s
<fct> <fct> <int> <dbl>
1 John London 1 1
2 John London 0 1
3 John London 1 1
4 John New_York 0 0
5 John New_York 1 1
6 John New_York 0 0
7 John New_York 0 0
8 John New_York 0 0
9 John New_York 1 1
10 John New_York 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.