简体   繁体   English

分割数据框并计算R中数据子集的平均值

[英]Split dataframe and calculate averages for data subsets in R

I have this data frame in R: 我在R中有此数据框:

steps   day         month  
4758    Tuesday     December
9822    Wednesday   December
10773   Thursday    December

I want to iterate over the data frame and apply a function to the steps column based on the value in the month column. 我想遍历数据框,并根据月份列中的值将函数应用于步骤列。 I'm trying to work out the average number of steps per weekday for each month. 我正在尝试计算每月每个工作日的平均步骤数。

I want to output to a new data frame like so where the week days repeat but I only have the average values per day: 我想像这样输出到一个新的数据框,在工作日重复的地方,但是我只有每天的平均值:

average.steps   day         month
4500            Tuesday     December
9000            Wednesday   December
1000            Thursday    December

I can work out how to work out the averages for the data frame as a whole, but want to use a for loop to apply it just for step values from the same month. 我可以算出如何计算整个数据帧的平均值,但是想使用for循环将其仅应用于同一月份的步长值。

avgsteps <- ddply(DATA, "day", summarise, msteps = mean(steps))

My basic idea for the for function was: 我对于for函数的基本想法是:

f <- function(m in month) {ddply(DATA, "day", summarise, msteps = mean(steps))}

But it won't process it and throws the error: 但是它不会处理它并抛出错误:

Error: unexpected 'in' in "f <- function(m in"

Any help would be greatly appreciated! 任何帮助将不胜感激!

EDIT: 编辑:

SO I've tried @agstudy's suggested fix (below) and it gets the right data structure (single value for each weekday for each month), but the value assigned to each day is identical. 因此,我尝试了@agstudy的建议修复方法(如下),它获得了正确的数据结构(每个月每个工作日的单个值),但是分配给每天的值是相同的。 I'm a bit confused what could be going wrong. 我有点困惑可能出了什么问题。

steps.month.day.avg <- ddply(steps.month.day, .(fitbit.day,fitbit.month), summarise, msteps = mean(steps))

无需在此处循环,您只需更改变量即可分割数据帧,

 ddply(DATA, .(day,month), summarise, msteps = mean(steps))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM