简体   繁体   English

Pandas.DataFrame.resample的意外结果

[英]Unexpected results from Pandas.DataFrame.resample

I have data set structured like below example for the month of January 2017. 我具有如下示例所示的2017年1月数据结构。

Date    ProductID   ProductType Qty
1.1.2017    1000    101 7
1.1.2017    1001    111 2
1.1.2017    1000    101 1
1.1.2017    1004    107 12
2.1.2017    1000    101 6
2.1.2017    1001    111 5
2.1.2017    1001    111 4
…..         
31.1.2017   1000    101 7
31.1.2017   1001    111 5
31.1.2017   1001    111 7

I want to calculate weekly sales for each product ID with product type 101 and 111 such that my result looks like the following 我想计算产品类型为101和111的每个产品ID的每周销售额,以使我的结果如下所示

ProductID|  WeeklyDates|    Sales
1000     | 1.1.2017    |     14
         | 1.8.2017    |     NaN
         | 1.15.2017   |     NaN
         | 1.22.2017   |     NaN
         | 1.29.2017   |      7
-----------------------------------
1001     | 1.1.2017    |     11
         | 1.8.2017    |     NaN
         | 1.15.2017   |     NaN
         | 1.22.2017   |     NaN
         | 1.29.2017   |     12

Here NaN means that I have no data for these dates in the example. 在此,NaN表示示例中没有这些日期的数据。 Now to get these results I am using the following code 现在要获得这些结果,我正在使用以下代码

import pandas as pd

df = pd.read_csv('data.csv', encoding = 'latin-1', sep=',')
df['Date'] = pd.to_datetime(df['Date'])
transaction_types = [101, 111]
s_df = df[df['ProductType'].isin(transaction_types)]
res_df = s_df.filter(['Date','ProductID','Qty']) # filter it because I do not want other product type column now
res_df = res_df.set_index('Date').groupby('ProductID').resample('W').sum()
res_df.to_csv('result.csv', sep=';', encoding='latin-1')

It returns me some wierd results. 它给我带来了一些奇怪的结果。 I am getting some dates which I don't even have in the data. 我得到的日期甚至是数据中没有的。 I am showing results for only one ID 我只显示一个ID的结果

ProductID|  Date        |ProductID| Qty
1000     |   01/01/2017 |  4000   |  41
1000     |   08/01/2017 |         |
1000     |   15/01/2017 |  33000  |  54
1000     |   22/01/2017 |  87000  |  313
1000     |   29/01/2017 |  79000  |  94
1000     |   05/02/2017 |  36000  |  413
1000     |   12/02/2017 |         | 
1000     |   19/02/2017 |         |
1000     |   26/02/2017 |         |
1000     |   05/03/2017 |  8000   |  78

The results are original and will not match the above example. 结果为原始结果,与上面的示例不符。 But the productID are coming 2 times and I think it also summing the productIDs too. 但是productID即将出现2次了,我认为它也对productID求和。 Also the sum is not correct. 而且总和是不正确的。 The dates are also going till march and in my data set I have dates of only January. 日期也要到三月为止,在我的数据集中,日期只有一月。 Can someone guide me where are possible problems in my code ? 有人可以指导我代码中可能出现问题的地方吗? Thanks 谢谢

I was not giving any date format.For example 我没有给出任何日期格式,例如

df['Date'] = pd.to_datetime(df['Date']) # Not correct
df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y') # Correct way

So because of this it was considering months as days and vice versa which was the reason I was getting wrong results. 因此,因此将几个月视为几天,反之亦然,这就是我得到错误结果的原因。

I got stuck on this same problem and came across this answer. 我陷入了同样的问题,并遇到了这个答案。 After looking through the pandas documentation I learnt a more flexible way to solve this is to allow pandas to infer the 'datetime' format as follow; 浏览完熊猫文档后,我了解到一种更灵活的解决方法,就是允许熊猫推断出“ datetime”格式,如下所示: df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM