R：根据日期循环数据帧提取数据子集

Question

I have a large data frame that consists of data that looks something like this: 我有一个大型数据框，包含如下所示的数据：

        date    w    x    y    z    region
1    2012 01    21   43   12    3   NORTH
2    2012 02    32   54   21   16   NORTH
3    2012 03    14   32   65   32   NORTH
4    2012 04    65   33   75   21   NORTH
:        :      :    :    :    :       :
:        :      :    :    :    :       :
12   2012 12    32   58   53   17   NORTH
13   2012 01    12   47   43   23   SOUTH
14   2012 02    87   43   21   76   SOUTH
:        :      :    :    :    :       :
25   2012 01    12   46   84   29    EAST
26   2012 02    85   29   90   12    EAST
:        :      :    :    :    :       :
:        :      :    :    :    :       :

I want to extract section of the data that have the same date value, for example to do this just for 2012 01 I would just create a subset of data 我想提取具有相同date值的数据部分，例如仅为2012 01执行此操作我将创建一个数据子集

data_1 <- subset(data, date == "2012 01")

and this gives me all the data for 2012 01 but I then go on to apply a function to this data. 这给了我2012 01所有数据，但我继续将函数应用于这些数据。 I would like to be able to apply my function to all possible subsets of my data, so ideally I would be looping through my large data frame and extracting the data for 2012 01, 2012 02, 2012 03, 2012 04... and applying a function to each of these subsets of data separately. 我希望能够将我的函数应用于我的所有可能的数据子集，所以理想情况下我将遍历我的大数据框并提取2012 01, 2012 02, 2012 03, 2012 04...并应用分别对这些数据子集中的每一个的函数。

But I would like to be able to apply this to my data frame even if my data frames length were to change, so it may not always go from 2012 01 - 2012 12 , the range of dates may vary so that sometimes it may be used on data from for example 2011 03 - 2013 01 . 但是我希望能够将这个应用到我的数据框中，即使我的数据帧长度发生变化，因此它可能并不总是从2012 01 - 2012 12 ，日期范围可能会有所不同，因此有时可能会被使用来自例如2011 03 - 2013 01 。

Answer 1

Loop through each unique date and build the subset. 遍历每个唯一日期并构建子集。

uniq <- unique(unlist(data$Date))
for (i in 1:length(uniq)){
    data_1 <- subset(data, date == uniq[i])
    #your desired function
}

Answer 2

is this what you want ? 这是你想要的吗？ df_list <- split(data, as.factor(data$date))

Answer 3

After sub-setting your dataset by date, imagine that the function you would like to apply to each subset is to find the mean of the column x . 在按日期对数据集进行子设置之后，假设您要应用于每个子集的函数是查找列x的平均值。 You could do it this way: (df is your dataframe) 你可以这样做:( df是你的数据帧）

 library(plyr)
 ddply(df, .(date), summarize, mean = mean(x))

Answer 4

您可以将data.frame拆分为data.frames list ，如下所示：

list.of.dfs<-by(data,data$date)

Answer 5

This is a perfect situation for the plyr package: 这是plyr包的完美情况：

require(plyr)
ddply(my_df, .(date), my_function, extra_arg_1, extra_arg_2)

where my_function is the function you want to perform on the split data frames, and extra_arg s are any extra arguments that need to go to that function. 其中my_function是您要对拆分数据帧执行的函数，而extra_arg是需要转到该函数的任何额外参数。

ddply ( d ata frame -> d ata frame) is the form you want if you want your results in a data frame; 如果你想在数据框中得到结果， ddply （ d ata frame - > d ata frame）就是你想要的形式; dlply returns a list. dlply返回一个列表。

R：根据日期循环数据帧提取数据子集

问题描述

5 个解决方案

解决方案1
15 2013-08-22 14:15:37

解决方案2
10 已采纳 2013-08-22 14:10:05

解决方案3
2 2013-08-22 14:18:09

解决方案4
0 2013-08-22 14:10:33

解决方案5
0 2013-08-22 14:13:45

R：根据日期循环数据帧提取数据子集

问题描述

5 个解决方案

解决方案1 15 2013-08-22 14:15:37

解决方案2 10 已采纳 2013-08-22 14:10:05

解决方案3 2 2013-08-22 14:18:09

解决方案4 0 2013-08-22 14:10:33

解决方案5 0 2013-08-22 14:13:45

解决方案1
15 2013-08-22 14:15:37

解决方案2
10 已采纳 2013-08-22 14:10:05

解决方案3
2 2013-08-22 14:18:09

解决方案4
0 2013-08-22 14:10:33

解决方案5
0 2013-08-22 14:13:45