使用熊猫将数日长的数据框拆分为半小时的数据框，并将其另存为csv文件

Question

I need to split quite a few large (several million records) files into half-hourly files using pandas to use with some other third-party software. 我需要使用熊猫将相当大的（几百万条记录）文件分成半小时的文件，以与其他一些第三方软件一起使用。 Here's what I tried: 这是我尝试过的：

import datetime as dt
import string
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(1728000, 2), index=pd.date_range('1/1/2014',
    periods=1728000, freq='0.1S'))
df_groups = df.groupby(df.index.map(lambda t: dt.datetime(t.year, t.month,
    t.day, t.hour)))
for name, group in df_groups:
    group.to_csv(string.replace(str(name), ':', '_') + '.csv')

But this way I can only get pandas to split by hour. 但是这样一来，我只能让熊猫按小时划分。 What should I do in case I want to split them into half-hourly files? 如果我要将它们分成半小时的文件，该怎么办？

A couple of things to keep in mind: a) the large files can span several days, so if I use lambda t: t.hour I get data from different days, but same hours grouped together; 需要牢记的几件事：a）大文件可能需要几天的时间，因此，如果我使用lambda t: t.hour我可以从不同日期获得数据，但同一时间将它们分组在一起； b) the large files have gaps, so some half-hours may not be full and some can be totally missing. b）大文件之间有空隙，因此有些半小时可能不够用，有些可能会完全丢失。

Answer 1

make your grouper like this: 使您的石斑鱼是这样的：

df.groupby(pd.TimeGrouper('30T'))

In 0.14 this will be slightly different, eg df.groupby(pd.Grouper(freq='30T')) 在0.14中，这会略有不同，例如df.groupby(pd.Grouper(freq='30T'))

使用熊猫将数日长的数据框拆分为半小时的数据框，并将其另存为csv文件

问题描述

1 个解决方案

解决方案1
5 已采纳 2014-03-17 13:27:52

使用熊猫将数日长的数据框拆分为半小时的数据框，并将其另存为csv文件

问题描述

1 个解决方案

解决方案1 5 已采纳 2014-03-17 13:27:52

解决方案1
5 已采纳 2014-03-17 13:27:52