根据另一列的汇总结果计算一列上的数据

Question

I would like to use data.table to calculate a summary statistic, and then based on that result, calculate a statistic on a second column. 我想使用data.table计算摘要统计信息，然后根据该结果在第二列上计算统计信息。

Here is an example using the Air Quality data. 这是使用空气质量数据的示例。

Set up the data 设置数据

(pretend it came this way) （假装是这样）

library(data.table)
dt = as.data.table(airquality)
dt[ , Season:=ifelse(Month>7, 'Fall', 'Summer')]

Some months have high wind 有几个月风很大

## The range of monthly Wind values
dt[ , list(MinWind=min(Wind), MaxWind=max(Wind)), 
        by=c('Season', 'Month')]

---- R OUTPUT:
   Season Month MinWind MaxWind
1: Summer     5     5.7    20.1
2: Summer     6     1.7    20.7
3: Summer     7     4.1    14.9
4:   Fall     8     2.3    15.5
5:   Fall     9     2.8    16.6
>

Goal: Calculate the average seasonal* Solar Radiation grouped by months that had Wind greater than or less than 20.* **目标：计算风量大于或小于20的月份的平均季节性太阳辐射量（按月分组）。**

Can I do this in one step? 我可以一步一步完成吗？

## Add a column to indicate if it was a high wind month
dt[, HighWind:=any(Wind>20), by=Month]
## Aggregate based on both HighWind and Season
dt[, list(AveSolarR=mean(Solar.R, na.rm=TRUE)), by=c("HighWind","Season")]

---- R OUTPUT:
   HighWind season AveSolarR
1:     TRUE Summer  185.9649
2:    FALSE Summer  216.4839
3:    FALSE   Fall  169.5690

Answer 1

Why not combine both into one list ? 为什么不将两者合并为一个list ？

dt[,list(HighWind=any(Wind>20),AveSolarR=mean(Solar.R,na.rm=T)),by=Month]
   Month HighWind AveSolarR
1:     5     TRUE  181.2963
2:     6     TRUE  190.1667
3:     7    FALSE  216.4839
4:     8    FALSE  171.8571
5:     9    FALSE  167.4333

For the modified problem, you need to do the HighWind calculation in the by statement, but I think it makes it more convoluted. 对于修改后的问题，您需要在by语句中进行HighWind计算，但我认为这会使问题更加复杂。

dt[,list(AveSolarR=mean(Solar.R,na.rm=T)),
  by=list(HighWind=Month%in%Month[Wind>20],Season)]
   HighWind Season AveSolarR
1:     TRUE Summer  185.9649
2:    FALSE Summer  216.4839
3:    FALSE   Fall  169.5690

根据另一列的汇总结果计算一列上的数据

问题描述

Set up the data 设置数据

Goal: Calculate the average seasonal* Solar Radiation grouped by months that had Wind greater than or less than 20.* **目标：计算风量大于或小于20的月份的平均季节性太阳辐射量（按月分组）。**

1 个解决方案

解决方案1
5 已采纳 2012-09-07 01:05:26

根据另一列的汇总结果计算一列上的数据

问题描述

Set up the data 设置数据

Goal: Calculate the average seasonal Solar Radiation grouped by months that had Wind greater than or less than 20. 目标：计算风量大于或小于20的月份的平均季节性太阳辐射量（按月分组）。

1 个解决方案

解决方案1 5 已采纳 2012-09-07 01:05:26

Goal: Calculate the average seasonal* Solar Radiation grouped by months that had Wind greater than or less than 20.* **目标：计算风量大于或小于20的月份的平均季节性太阳辐射量（按月分组）。**

解决方案1
5 已采纳 2012-09-07 01:05:26