如何根据分位数按日期删除行？

Question

My problem is the following: I would like to remove rows in a data frame which are lower than the 50th percentile defined for each date.我的问题如下：我想删除数据框中低于为每个日期定义的第 50 个百分位的行。 The following example illustrate my problem.下面的例子说明了我的问题。

I have the following data frame:我有以下数据框：

date <- c("01.02.2011","01.02.2011","01.02.2011","01.02.2011","01.02.2011","01.02.2011",
          "01.02.2011","01.02.2011","01.02.2011","01.02.2011",
          "02.02.2011","02.02.2011","02.02.2011","02.02.2011","02.02.2011","02.02.2011",
          "02.02.2011","02.02.2011","02.02.2011","02.02.2011")
date <- as.Date(date, format="%d.%m.%Y")
ID <- c("A","B","C","D","E","F","G","H","I","J",
        "A","B","C","D","E","F","G","H","I","J")
values <- as.numeric(c("1","8","2","3","5","13","2","4","1","16",
                       "4","2","12","16","8","1","7","11","2","10"))

df <- data.frame(ID, date, values)

Looking like this:看起来像这样：

   ID       date values
1   A 2011-02-01      1
2   B 2011-02-01      8
3   C 2011-02-01      2
4   D 2011-02-01      3
5   E 2011-02-01      5
6   F 2011-02-01     13
7   G 2011-02-01      2
8   H 2011-02-01      4
9   I 2011-02-01      1
10  J 2011-02-01     16
11  A 2011-02-02      4
12  B 2011-02-02      2
13  C 2011-02-02     12
14  D 2011-02-02     16
15  E 2011-02-02      8
16  F 2011-02-02      1
17  G 2011-02-02      7
18  H 2011-02-02     11
19  I 2011-02-02      2
20  J 2011-02-02     10

I would like to delete all the rows for each date where values are below the 50th percentile (defined by date) in order to obtain:我想删除值低于第 50 个百分位（由日期定义）的每个日期的所有行，以获得：

   ID       date values
2   B 2011-02-01      8
5   E 2011-02-01      5
6   F 2011-02-01     13
8   H 2011-02-01      4
10  J 2011-02-01     16
13  C 2011-02-02     12
14  D 2011-02-02     16
15  E 2011-02-02      8
18  H 2011-02-02     11
20  J 2011-02-02     10

If any editing of my question is needed, do not hesitate to let me know如果需要对我的问题进行任何编辑，请随时告诉我

Answer 1

You have several ways to do that.你有几种方法可以做到这一点。 Some solutions here but there exists much more way to do that.这里有一些解决方案，但还有更多方法可以做到这一点。 They all apply the same idea: first compute median by date, then filter your data.他们都采用相同的想法：首先按日期计算中位数，然后过滤您的数据。

data.table data.table

If you want to use data.table , first you update your data by reference using := then you filter.如果要使用data.table ，首先使用:=通过引用更新数据，然后进行过滤。 data.table is a very efficient approach if your dataset is voluminous.如果您的数据集很大， data.table是一种非常有效的方法。

library(data.table)
setDT(df)

df[, quant := quantile(values, probs = .5),by = "date"]
df2 <- df[values>quant]
df2[,'quant' := NULL]

df2
    ID       date values
 1:  B 2011-02-01      8
 2:  E 2011-02-01      5
 3:  F 2011-02-01     13
 4:  H 2011-02-01      4
 5:  J 2011-02-01     16
 6:  C 2011-02-02     12
 7:  D 2011-02-02     16
 8:  E 2011-02-02      8
 9:  H 2011-02-02     11
10:  J 2011-02-02     10

dplyr dplyr

With dplyr , you pipe your operations your operations: compute quantile by group and then filter使用dplyr ，您 pipe 您的操作您的操作：按组计算分位数，然后过滤

library(dplyr)
df %>%
   group_by(date) %>%
   mutate(quant = quantile(values, .5)) %>%
   filter(values>quant) %>%
   select(-quant)

Groups:   date [2]
   ID    date       values
   <fct> <date>      <dbl>
 1 B     2011-02-01      8
 2 E     2011-02-01      5
 3 F     2011-02-01     13
 4 H     2011-02-01      4
 5 J     2011-02-01     16
 6 C     2011-02-02     12
 7 D     2011-02-02     16
 8 E     2011-02-02      8
 9 H     2011-02-02     11
10 J     2011-02-02     10

如何根据分位数按日期删除行？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-12 09:29:22

data.table data.table

dplyr dplyr

如何根据分位数按日期删除行？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-12 09:29:22

data.table data.table

dplyr dplyr

解决方案1
1 已采纳 2020-04-12 09:29:22