n 个不同的最高/最低值的最小值、最大值、平均值，并将它们与时间序列数据一起绘制在 R 的同一图表上

Question

I am dealing with a large time series data set (with almost 100K records) with unix timestamp.我正在处理一个带有 unix 时间戳的大型时间序列数据集（几乎有 10 万条记录）。 I need min, mean, max, avg_of_lowest_n, avg_of_top_n from the value column.我需要value列中的min, mean, max, avg_of_lowest_n, avg_of_top_n 。 I can get min, mean, max as follows:我可以得到min, mean, max如下：

tapply(df$value, df$pattern, min)
tapply(df$value, df$pattern, mean)
tapply(df$value, df$pattern, max)

Now, I need to get the mean of the lowest n distinct values and top n distinct values in two other columns for each pattern (group).现在，我需要为每个模式（组）在另外两列中获取lowest n distinct values和top n distinct values的mean 。 I can get the mean of the lowest and top n (say 5 values) from the following, but I think the mean is not calculated by distinct 5 values for each group (pattern), and here I need to know, how can I do that.我可以从以下得到最低和最高 n 的mean （比如 5 个值） ，但我认为mean不是由每个组（模式）的不同 5 个值计算得出的，在这里我需要知道，我该怎么做那。

setDT(df_stat) #requires Data.table
df_n[order(value)][, list(mean_of_low_5=mean(value[1:5])), by=pattern]
df_n[order(-value)][, list(mean_of_top_5=mean(value[1:5])), by=pattern]

Any simple way of doing this is highly appreciated.任何简单的方法都受到高度赞赏。

Sample data-样本数据-

df <- structure(list(pattern = c(462L, 462L, 462L, 462L, 462L, 462L, 
462L, 462L, 462L, 462L, 462L, 463L, 463L, 463L, 463L, 463L, 463L, 
463L, 463L, 463L, 463L, 463L, 463L, 463L, 463L, 464L, 464L, 464L, 
464L, 464L, 464L, 464L, 464L, 464L, 464L, 464L, 464L, 464L, 465L, 
465L, 465L, 465L, 465L, 466L, 466L, 466L, 466L, 466L, 466L, 466L, 
466L, 466L, 466L, 466L, 466L, 961L, 961L, 961L, 961L, 961L, 961L, 
961L), value = c(5.8e+10, 4.35e+10, 3.96e+10, 3.6e+10, 3.48e+10, 
3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 1e+09, 
1e+09, 1e+09, 1e+09, 1e+09, 1e+09, 1e+09, 1e+09, 1e+09, 1e+09, 
1e+09, 1e+09, 1e+09, 1e+09, 3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 
3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 
3.3e+10, 3.3e+10, 3e+10, 3e+10, 3e+10, 3e+10, 3e+10, 3.3e+10, 
3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 3.3e+10, 
3.3e+10, 3.2e+10, 3.2e+10, 3.2e+10, 2.6e+10, 2.6e+10, 2.6e+10, 
2.6e+10, 2.6e+10, 2.6e+10, 2.6e+10), timestamp = c(1590604157L, 
1590604157L, 1590604157L, 1590604157L, 1590604157L, 1590604157L, 
1590604157L, 1590604157L, 1590604157L, 1590604157L, 1590604157L, 
1590604170L, 1590604170L, 1590604170L, 1590604170L, 1590604170L, 
1590604170L, 1590604170L, 1590604170L, 1590604170L, 1590604170L, 
1590604170L, 1590604170L, 1590604170L, 1590604170L, 1590604213L, 
1590604213L, 1590604213L, 1590604213L, 1590604213L, 1590604213L, 
1590604213L, 1590604213L, 1590604213L, 1590604213L, 1590604213L, 
1590604213L, 1590604213L, 1590604226L, 1590604226L, 1590604226L, 
1590604226L, 1590604226L, 1590604239L, 1590604239L, 1590604239L, 
1590604239L, 1590604239L, 1590604239L, 1590604239L, 1590604239L, 
1590604239L, 1590604239L, 1590604239L, 1590604239L, 1590610895L, 
1590610895L, 1590610895L, 1590610895L, 1590610895L, 1590610895L, 
1590610895L)), class = "data.frame", row.names = c(NA, -62L))

Answer 1

You can do all the calculation in one pipe using dplyr :您可以使用dplyr在一个 pipe 中完成所有计算：

library(dplyr)

df %>%
  group_by(pattern) %>%
  summarise(min_val = min(value), 
            max_val = max(value), 
            mean_val = mean(value), 
            lowest_n_val = mean(head(unique(sort(value)), 5)),
            highest_n_val = mean(tail(unique(sort(value)), 5)))

You can add na.rm. =TRUE您可以添加na.rm. =TRUE na.rm. =TRUE in all the above functions if you have NA in your data.如果您的数据中有NA ，则上述所有函数中的na.rm. =TRUE 。

n 个不同的最高/最低值的最小值、最大值、平均值，并将它们与时间序列数据一起绘制在 R 的同一图表上

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-02 01:34:58

n 个不同的最高/最低值的最小值、最大值、平均值，并将它们与时间序列数据一起绘制在 R 的同一图表上

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-02 01:34:58

解决方案1
1 已采纳 2020-06-02 01:34:58