简体   繁体   English

使用熊猫将数日长的数据框拆分为半小时的数据框,并将其另存为csv文件

[英]Splitting several days long dataframe into half-hourly dataframes using pandas and save them as csv-files

I need to split quite a few large (several million records) files into half-hourly files using pandas to use with some other third-party software. 我需要使用熊猫将相当大的(几百万条记录)文件分成半小时的文件,以与其他一些第三方软件一起使用。 Here's what I tried: 这是我尝试过的:

import datetime as dt
import string
import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(1728000, 2), index=pd.date_range('1/1/2014',
    periods=1728000, freq='0.1S'))
df_groups = df.groupby(df.index.map(lambda t: dt.datetime(t.year, t.month,
    t.day, t.hour)))
for name, group in df_groups:
    group.to_csv(string.replace(str(name), ':', '_') + '.csv')

But this way I can only get pandas to split by hour. 但是这样一来,我只能让熊猫按小时划分。 What should I do in case I want to split them into half-hourly files? 如果我要将它们分成半小时的文件,该怎么办?

A couple of things to keep in mind: a) the large files can span several days, so if I use lambda t: t.hour I get data from different days, but same hours grouped together; 需要牢记的几件事:a)大文件可能需要几天的时间,因此,如果我使用lambda t: t.hour我可以从不同日期获得数据,但同一时间将它们分组在一起; b) the large files have gaps, so some half-hours may not be full and some can be totally missing. b)大文件之间有空隙,因此有些半小时可能不够用,有些可能会完全丢失。

make your grouper like this: 使您的石斑鱼是这样的:

df.groupby(pd.TimeGrouper('30T'))

In 0.14 this will be slightly different, eg df.groupby(pd.Grouper(freq='30T')) 在0.14中,这会略有不同,例如df.groupby(pd.Grouper(freq='30T'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM