简体   繁体   English

使用data.table对符合条件的分组观察进行计数

[英]Count grouped observations that meet conditions using data.table

I have hourly/daily stock return data similar to the following: 我有类似以下的每小时/每天股票退货数据:

  set.seed(1)
  dt <- data.table(stock = c(rep("a",24),rep("b",24),rep("c",24),rep("d",24)),
  hour = rep(1:24,4), day1 = sample(-5:5,96,replace = TRUE), 
  day2 = sample(-10:-1,96,replace = TRUE), day3 = sample(0:10,96,replace = TRUE),
  day4 = 0)

Which looks like: 看起来像:

  stock hour    day1    day2    day3    day4
  a      1      -3       -6      1       0
  a      2      -1       -6      10      0
  a      3       1       -2      3       0
  ...                   
  d      22      4       -5      1       0
  d      23      3       -3      3       0
  d      24      3       -7      1       0

I would like to count the following for each day: 我想每天计算以下内容:

  1. how many stocks show negative returns for all hours 多少股票在整个小时内都显示负收益
  2. how many stocks show positive returns for all hours 多少股票在所有小时内都显示正收益
  3. how many stocks show mixed positive and negative returns during 24 hours 24小时内有多少股票显示正负混合收益
  4. how many stocks show at least one zero during 24 hours 24小时内有多少股票显示至少一个零
  5. how many stocks show zero for all 24 hours 24小时内有多少股票显示为零

The output should look similar to this: 输出应类似于以下内容:

  counts                            day1    day2    day3    day4
  stocks_where_all_hours_negative   0       4       0       0
  stocks_where_all_hours_positive   0       0       4       0
  stocks_where_mixed_pos_and_neg    4       0       0       0
  stocks_where_at_least_one_zero    4       0       2       4
  stocks_where_all_zero             0       0       0       4

Thanks so much. 非常感谢。

mdt <- melt(
  dt,
  id.vars = c('stock', 'hour'),
  measure.vars = c('day1', 'day2', 'day3', 'day4'),
  variable.name = 'day',
  value.name = 'position'
)
counts <- mdt[ ,.(stocks_where_all_hours_negative = all(position < 0),
        stocks_where_all_hours_positive = all(position > 0),
        stocks_where_mixed_pos_and_neg = any(position < 0) & any(position > 0),
        stocks_where_at_least_one_zero = any(position != 0),
        stocks_where_all_zero = all(position == 0)),
     by = c('day',
            'stock')][ ,.(stocks_where_all_hours_negative = sum(stocks_where_all_hours_negative),
                          stocks_where_all_hours_positive = sum(stocks_where_all_hours_positive),
                          stocks_where_mixed_pos_and_neg = sum(stocks_where_mixed_pos_and_neg),
                          stocks_where_at_least_one_zero = sum(stocks_where_at_least_one_zero),
                          stocks_where_all_zero = sum(stocks_where_all_zero)),
                       by = 'day']
t(counts)

First reshape the data to 'long' format so that you can group by days. 首先将数据重整为“长”格式,以便您可以按天分组。 any and all functions will check the logic for each vector and return the scenarios you have set up. anyall功能将检查逻辑每个向量,并返回已经设置了场景。

Since the summary you are doing is so highly specialized, you could write your own function: 由于您正在执行的摘要非常专业,因此您可以编写自己的函数:

summarize_stocks <- function(stock, day, return_names = FALSE){
  if(!return_names) return( c(length(unique(stock[as.logical(ave(day, stock, FUN = function(x) all(x<0)))])),
                              length(unique(stock[as.logical(ave(day, stock, FUN = function(x) all(x>0)))])),
                              length(unique(stock[as.logical(ave(day, stock, FUN = function(x) any(x>0) & any(x<0)))])),
                              length(unique(stock[as.logical(ave(day, stock, FUN = function(x) any(x==0)))])),
                              length(unique(stock[as.logical(ave(day, stock, FUN = function(x) all(x==0)))]))
                              ))
  else return(c("stocks_where_all_hours_negative", "stocks_where_all_hours_positive", "stocks_where_mixed_pos_and_neg",
                  "stocks_where_at_least_one_zero", "stocks_where_all_zero"))
}

res <- dt[, lapply(.SD, summarize_stocks, stock = stock), .SDcols = c("day1","day2", "day3", "day4")]
res[, counts := summarize_stocks(return_names = TRUE)]

res
#       day1 day2 day3 day4                          counts
#1:    0    4    0    0 stocks_where_all_hours_negative
#2:    0    0    2    0 stocks_where_all_hours_positive
#3:    4    0    0    0  stocks_where_mixed_pos_and_neg
#4:    4    0    2    4  stocks_where_at_least_one_zero
#5:    0    0    0    4           stocks_where_all_zero

Or if you want to get fancy you could change your summarize_stocks to: 或者,如果你想获得看上你可以改变你的summarize_stocks到:

summarize_stocks <- function(stock, day, return_names = FALSE){
  funs <- list(function(x) all(x<0),
               function(x) all(x>0),
               function(x) any(x>0) & any(x<0),
               function(x) any(x==0),
               function(x) all(x==0))

  if(!return_names) return( vapply(funs, function(f) length(unique(stock[as.logical(ave(day, stock, FUN = function(x) f(x)))])), numeric(1L)))

  else return(c("stocks_where_all_hours_negative", "stocks_where_all_hours_positive", "stocks_where_mixed_pos_and_neg",
                  "stocks_where_at_least_one_zero", "stocks_where_all_zero"))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM