使用 python pandas 计算每日总和

Question

I'm trying to calculate daily sums of values using pandas.我正在尝试使用熊猫计算每日值的总和。 Here's the test file - http://pastebin.com/uSDfVkTS这是测试文件 - http://pastebin.com/uSDfVkTS

This is the code I came up so far:这是我到目前为止提出的代码：

import numpy as np
import datetime as dt
import pandas as pd

f = np.genfromtxt('test', dtype=[('datetime', '|S16'), ('data', '<i4')], delimiter=',')
dates = [dt.datetime.strptime(i, '%Y-%m-%d %H:%M') for i in f['datetime']]
s = pd.Series(f['data'], index = dates)
d = s.resample('D', how='sum')

Using the given test file this produces:使用给定的测试文件，这会产生：

2012-01-02    1128
Freq: D

First problem is that calculated sum corresponds to the next day.第一个问题是计算的总和对应于第二天。 I've been able to solve that by using parameter loffset='-1d'.我已经能够通过使用参数 loffset='-1d' 来解决这个问题。

Now the actual problem is that the data may start not from 00:30 of a day but at any time of a day.现在实际的问题是数据可能不是从一天的 00:30 开始，而是从一天中的任何时间开始。 Also the data has gaps filled with 'nan' values.此外，数据的空白处充满了 'nan' 值。

That said, is it possible to set a lower threshold of number of values that are necessary to calculate daily sums?也就是说，是否可以设置计算每日总和所需的值数量的较低阈值？ (eg if there're less than 40 values in a single day, then put NaN instead of a sum) （例如，如果一天内少于 40 个值，则输入 NaN 而不是总和）

I believe that it is possible to define a custom function to do that and refer to it in 'how' parameter, but I have no clue how to code the function itself.我相信可以定义一个自定义函数来做到这一点并在 'how' 参数中引用它，但我不知道如何对函数本身进行编码。

Answer 1

You can do it directly in Pandas:您可以直接在 Pandas 中执行此操作：

s = pd.read_csv('test', header=None, index_col=0, parse_dates=True)
d = s.groupby(lambda x: x.date()).aggregate(lambda x: sum(x) if len(x) >= 40 else np.nan)

             X.2
2012-01-01  1128

Answer 2

更简单的方法是使用pd.Grouper ：

d = s.groupby(pd.Grouper(freq='1D')).sum()

使用 python pandas 计算每日总和

问题描述

2 个解决方案

解决方案1
13 已采纳 2012-11-20 14:59:23

解决方案2
1 2020-02-26 12:38:05

使用 python pandas 计算每日总和

问题描述

2 个解决方案

解决方案1 13 已采纳 2012-11-20 14:59:23

解决方案2 1 2020-02-26 12:38:05

解决方案1
13 已采纳 2012-11-20 14:59:23

解决方案2
1 2020-02-26 12:38:05