[英]Removing trailing 0s and 1s from a dataset in r
I have a dataset that is set up like this:我有一个这样设置的数据集:
bird鸟 | outcome结果 |
---|---|
a一种 | 0 0 |
a一种 | 0 0 |
a一种 | 1 1 |
a一种 | 1 1 |
b乙 | 0 0 |
b乙 | 1 1 |
b乙 | 0 0 |
c C | 1 1 |
c C | 1 1 |
c C | 1 1 |
For all birds whose last outcome was 0, I removed all trailing 0s and the last 1 that preceded the trail of 0s.对于最后结果为 0 的所有鸟,我删除了所有尾随的 0 和尾随 0 之前的最后一个 1。 I used the following code:我使用了以下代码:
detect <- detect %>%
group_by(bird) %>%
mutate(new = cumsum(outcome)) %>%
filter(if(last(outcome) == 0) new <max(new) else TRUE) %>%
ungroup %>%
select(-new)
This code worked perfectly and produced this output:这段代码工作得很好并产生了这个输出:
bird鸟 | outcome结果 |
---|---|
a一种 | 0 0 |
a一种 | 0 0 |
a一种 | 1 1 |
a一种 | 1 1 |
b乙 | 0 0 |
c C | 1 1 |
c C | 1 1 |
c C | 1 1 |
Only b was trimmed because it was the only bird whose last remaining observation was 0. I would like to expand the code and have the last 1 observation trimmed for birds whose last observation was 1. I would like the output to look like this:只有 b 被修剪,因为它是唯一最后一次剩余观察值为 0 的鸟。我想扩展代码并为最后一次观察值为 1 的鸟修剪最后 1 次观察值。我希望输出看起来像这样:
bird鸟 | outcome结果 |
---|---|
a一种 | 0 0 |
a一种 | 0 0 |
a一种 | 1 1 |
b乙 | 0 0 |
c C | 1 1 |
c C | 1 1 |
Birds with last remaining observation of 1 had their last 1 removed, and birds with last remaining observation of 0 had trailing 0s and last 1 preceding the 0 removed.最后剩余观察值为 1 的鸟将其最后 1 移除,而最后剩余观察值为 0 的鸟具有尾随 0,并且移除 0 之前的最后 1。 But, I want this trimming to run simultaneously, not one after the other.但是,我希望这种修剪同时运行,而不是一个接一个。 For example, if I have a bird with outcome 0001100, I would like the trailing 0s and last 1 removed to produce 0001. I don't want it to be trimmed again and have the last remaining 1 removed.例如,如果我有一只结果为 0001100 的鸟,我希望删除尾随的 0 和最后一个 1 以生成 0001。我不想再次修剪它并删除最后剩下的 1。
detect %>%
group_by(bird) %>%
mutate(new = cumsum(outcome)) %>%
filter(if(last(outcome) == 0) new < max(new) else TRUE) %>%
select(-new) %>%
filter(!(row_number() == n() & last(outcome) == 1)) %>%
ungroup()
# A tibble: 6 × 2
# bird outcome
# <chr> <int>
# 1 a 0
# 2 a 0
# 3 a 1
# 4 b 0
# 5 c 1
# 6 c 1
Using this data:使用这些数据:
detect = read.table(text = 'bird outcome
a 0
a 0
a 1
a 1
b 0
b 1
b 0
c 1
c 1
c 1', header = T)
You could do:你可以这样做:
df %>%
group_by(bird) %>%
summarise(outcome = str_remove(str_c(outcome, collapse = ""), "(10+$)|(1$)")) %>%
separate_rows(outcome, sep="(?<=.)(?=.)", convert = TRUE)
# A tibble: 6 x 2
bird outcome
<chr> <int>
1 a 0
2 a 0
3 a 1
4 b 0
5 c 1
6 c 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.