[英]R calculate median and last row in groups for certain rows
I'm working with grouping and median, I'd like to have a grouping of a data.frame
with the median of certain rows (not all) and the last value. 我正在使用分组和中位数,我想对data.frame
进行分组,其中data.frame
某些行(并非全部)的中位数和最后一个值。
My data are something like this: 我的数据是这样的:
test <- data.frame(
id = c('A','A','A','A','A','B','B','B','B','B','C','C','C','C'),
value = c(1,2,3,4,5,3,4,5,1,8,3,4,2,9))
> test
id value
1 A 1
2 A 2
3 A 3
4 A 4
5 A 5
6 B 3
7 B 4
8 B 5
9 B 1
10 B 8
11 C 3
12 C 4
13 C 2
14 C 9
For each id
, I need the median of the three (number may vary, in this case three) central rows, then the last value. 对于每个id
,我需要三个中间行(中值可能有所不同,在这种情况下为三个)中间行的中位数,然后是最后一个值。
I've tried first of all with only one id
. 我首先尝试了一个id
。
test_a <- test[which(test$id == 'A'),]
> test_a
id value
1 A 1
2 A 2
3 A 3
4 A 4
5 A 5
The desired output is this for this one, Having this: 所需的输出为此,具有以下内容:
median(test_a[(nrow(test_a)-3):(nrow(test_a)-1),]$value) # median of three central values
tail(test_a,1)$value # last value
I used this: 我用这个:
library(tidyverse)
test_a %>% group_by(id) %>%
summarise(m = median(test_a[(nrow(test_a)-3):(nrow(test_a)-1),]$value),
last = tail(test_a,1)$value) %>%
data.frame()
id m last
1 A 3 5
But when I tried to generalize to all id: 但是当我尝试归纳为所有id时:
test %>% group_by(id) %>%
summarise(m = median(test[(nrow(test)-3):(nrow(test)-1),]$value),
last = tail(test,1)$value) %>%
data.frame()
id m last
1 A 3 9
2 B 3 9
3 C 3 9
I think that the formulas take the full dataset to calculate last value and median, but I cannot imagine how to make it works. 我认为公式可以使用完整的数据集来计算最后一个值和中位数,但是我无法想象如何使它起作用。 Thanks in advance. 提前致谢。
This works: 这有效:
test %>%
group_by(id) %>%
summarise(m = median(value[(length(value)-3):(length(value)-1)]),
last = value[length(value)])
# A tibble: 3 x 3
id m last
<fctr> <dbl> <dbl>
1 A 3 5
2 B 4 8
3 C 4 9
You just refer to variable value
instead of the whole dataset within summarise
. 你只是参考变量value
,而不是内部的整个数据集summarise
。
Edit: Here's a generalized version. 编辑:这是一个广义的版本。
test %>%
group_by(id) %>%
summarise(m = ifelse(length(value) == 1, value,
ifelse(length(value) == 2, median(value),
median(value[(ceiling(length(value)/2)-1):(ceiling(length(value)/2)+1)])),
last = value[length(value)])
If a group has only one row, the value itself will be stored in m
. 如果一组只有一行,则值本身将存储在m
。 If it has only two rows, the median
of these two rows will be stored in m
. 如果只有两行,则这两行的median
将存储在m
。 If it has three or more rows, the middle three rows will be chosen dynamically and the median
of those will be stored in m
. 如果它具有三行或更多行,则将动态选择中间三行,并将其中median
存储在m
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.