简体   繁体   English

将 pandas 组作为新数据框访问

[英]Access a pandas group as new data frame

I am new to data analysis with pandas/pandas, coming from a Matlab background.我是来自 Matlab 背景的 pandas/pandas 数据分析新手。 I am trying to group data and then process the individual groups.我正在尝试对数据进行分组,然后处理各个组。 However, I cannot figure out how to actually access the grouping result.但是,我无法弄清楚如何实际访问分组结果。

Here is my setup: I have a pandas dataframe df with a regular-spaced DateTime index timestamp of 10 minutes frequency.这是我的设置:我有一个 Pandas 数据帧df ,其规则间隔的 DateTime 索引timestamp为 10 分钟频率。 My data spans several weeks in total.我的数据总共跨越了几个星期。 I now want to group the data by days, like so:我现在想按天对数据进行分组,如下所示:

grouping = df.groupby([pd.Grouper(level="timestamp", freq="D",)])

Note that I do not want to aggregate the groups (contrary to most examples and tutorials, it seems).请注意,我不想聚合这些组(似乎与大多数示例和教程相反)。 I simply want to take each group in turn and process it individually, like so (does not work):我只是想轮流处理每个组并单独处理它,就像这样(不起作用):

for g in grouping:
  g_df = d.toDataFrame()
  some_processing(g_df)

How do I do that?我怎么做? I haven't found any way to extract daily dataframe objects from the DataFrameGroupBy object.我还没有找到从DataFrameGroupBy对象中提取每日数据框对象的任何方法。

Expand your groups into a dictionary of dataframes:将您的组扩展为数据框字典:

data = dict(list(df.groupby(df.index.date.astype(str))))
>>> data.keys()
dict_keys(['2021-01-01', '2021-01-02'])

>>> data['2021-01-01']
                        value
timestamp                    
2021-01-01 00:00:00  0.405630
2021-01-01 01:00:00  0.262235
2021-01-01 02:00:00  0.913946
2021-01-01 03:00:00  0.467516
2021-01-01 04:00:00  0.367712
2021-01-01 05:00:00  0.849070
2021-01-01 06:00:00  0.572143
2021-01-01 07:00:00  0.423401
2021-01-01 08:00:00  0.931463
2021-01-01 09:00:00  0.554809
2021-01-01 10:00:00  0.561663
2021-01-01 11:00:00  0.537471
2021-01-01 12:00:00  0.461099
2021-01-01 13:00:00  0.751878
2021-01-01 14:00:00  0.266371
2021-01-01 15:00:00  0.954553
2021-01-01 16:00:00  0.895575
2021-01-01 17:00:00  0.752671
2021-01-01 18:00:00  0.230219
2021-01-01 19:00:00  0.750243
2021-01-01 20:00:00  0.812728
2021-01-01 21:00:00  0.195416
2021-01-01 22:00:00  0.178367
2021-01-01 23:00:00  0.607105

Note : I changed your groups to be easier indexing: '2021-01-01' instead of Timestamp('2021-01-01 00:00:00', freq='D')注意:我将您的组更改为更容易索引: '2021-01-01'而不是Timestamp('2021-01-01 00:00:00', freq='D')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM