简体   繁体   English

R计算某些行的组中位数和最后一行

[英]R calculate median and last row in groups for certain rows

I'm working with grouping and median, I'd like to have a grouping of a data.frame with the median of certain rows (not all) and the last value. 我正在使用分组和中位数,我想对data.frame进行分组,其中data.frame某些行(并非全部)的中位数和最后一个值。
My data are something like this: 我的数据是这样的:

 test <- data.frame(
id = c('A','A','A','A','A','B','B','B','B','B','C','C','C','C'),
value = c(1,2,3,4,5,3,4,5,1,8,3,4,2,9))
> test
   id value
1   A     1
2   A     2
3   A     3
4   A     4
5   A     5
6   B     3
7   B     4
8   B     5
9   B     1
10  B     8
11  C     3
12  C     4
13  C     2
14  C     9

For each id , I need the median of the three (number may vary, in this case three) central rows, then the last value. 对于每个id ,我需要三个中间行(中值可能有所不同,在这种情况下为三个)中间行的中位数,然后是最后一个值。
I've tried first of all with only one id . 我首先尝试了一个id

test_a <- test[which(test$id == 'A'),]
> test_a
  id value
1  A     1
2  A     2
3  A     3
4  A     4
5  A     5

The desired output is this for this one, Having this: 所需的输出为此,具有以下内容:

median(test_a[(nrow(test_a)-3):(nrow(test_a)-1),]$value) # median of three central values
tail(test_a,1)$value                                     # last value

I used this: 我用这个:

library(tidyverse)

test_a %>% group_by(id) %>%
  summarise(m = median(test_a[(nrow(test_a)-3):(nrow(test_a)-1),]$value),
            last = tail(test_a,1)$value) %>%
  data.frame()
  id m last
1  A 3    5

But when I tried to generalize to all id: 但是当我尝试归纳为所有id时:

test %>% group_by(id) %>%
   summarise(m = median(test[(nrow(test)-3):(nrow(test)-1),]$value),
             last = tail(test,1)$value) %>%
   data.frame()
  id m last
1  A 3    9
2  B 3    9
3  C 3    9

I think that the formulas take the full dataset to calculate last value and median, but I cannot imagine how to make it works. 我认为公式可以使用完整的数据集来计算最后一个值和中位数,但是我无法想象如何使它起作用。 Thanks in advance. 提前致谢。

This works: 这有效:

test %>% 
  group_by(id) %>%
  summarise(m = median(value[(length(value)-3):(length(value)-1)]),
            last = value[length(value)])

# A tibble: 3 x 3
      id     m  last
  <fctr> <dbl> <dbl>
1      A     3     5
2      B     4     8
3      C     4     9

You just refer to variable value instead of the whole dataset within summarise . 你只是参考变量value ,而不是内部的整个数据集summarise


Edit: Here's a generalized version. 编辑:这是一个广义的版本。

test %>% 
  group_by(id) %>%
  summarise(m = ifelse(length(value) == 1, value, 
                       ifelse(length(value) == 2, median(value), 
                              median(value[(ceiling(length(value)/2)-1):(ceiling(length(value)/2)+1)])),
            last = value[length(value)])

If a group has only one row, the value itself will be stored in m . 如果一组只有一行,则值本身将存储在m If it has only two rows, the median of these two rows will be stored in m . 如果只有两行,则这两行的median将存储在m If it has three or more rows, the middle three rows will be chosen dynamically and the median of those will be stored in m . 如果它具有三行或更多行,则将动态选择中间三行,并将其中median存储在m

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM