[英]Unexpected results from Pandas.DataFrame.resample
I have data set structured like below example for the month of January 2017. 我具有如下示例所示的2017年1月数据结构。
Date ProductID ProductType Qty
1.1.2017 1000 101 7
1.1.2017 1001 111 2
1.1.2017 1000 101 1
1.1.2017 1004 107 12
2.1.2017 1000 101 6
2.1.2017 1001 111 5
2.1.2017 1001 111 4
…..
31.1.2017 1000 101 7
31.1.2017 1001 111 5
31.1.2017 1001 111 7
I want to calculate weekly sales for each product ID with product type 101 and 111 such that my result looks like the following 我想计算产品类型为101和111的每个产品ID的每周销售额,以使我的结果如下所示
ProductID| WeeklyDates| Sales
1000 | 1.1.2017 | 14
| 1.8.2017 | NaN
| 1.15.2017 | NaN
| 1.22.2017 | NaN
| 1.29.2017 | 7
-----------------------------------
1001 | 1.1.2017 | 11
| 1.8.2017 | NaN
| 1.15.2017 | NaN
| 1.22.2017 | NaN
| 1.29.2017 | 12
Here NaN means that I have no data for these dates in the example. 在此,NaN表示示例中没有这些日期的数据。 Now to get these results I am using the following code 现在要获得这些结果,我正在使用以下代码
import pandas as pd
df = pd.read_csv('data.csv', encoding = 'latin-1', sep=',')
df['Date'] = pd.to_datetime(df['Date'])
transaction_types = [101, 111]
s_df = df[df['ProductType'].isin(transaction_types)]
res_df = s_df.filter(['Date','ProductID','Qty']) # filter it because I do not want other product type column now
res_df = res_df.set_index('Date').groupby('ProductID').resample('W').sum()
res_df.to_csv('result.csv', sep=';', encoding='latin-1')
It returns me some wierd results. 它给我带来了一些奇怪的结果。 I am getting some dates which I don't even have in the data. 我得到的日期甚至是数据中没有的。 I am showing results for only one ID 我只显示一个ID的结果
ProductID| Date |ProductID| Qty
1000 | 01/01/2017 | 4000 | 41
1000 | 08/01/2017 | |
1000 | 15/01/2017 | 33000 | 54
1000 | 22/01/2017 | 87000 | 313
1000 | 29/01/2017 | 79000 | 94
1000 | 05/02/2017 | 36000 | 413
1000 | 12/02/2017 | |
1000 | 19/02/2017 | |
1000 | 26/02/2017 | |
1000 | 05/03/2017 | 8000 | 78
The results are original and will not match the above example. 结果为原始结果,与上面的示例不符。 But the productID are coming 2 times and I think it also summing the productIDs too. 但是productID即将出现2次了,我认为它也对productID求和。 Also the sum is not correct. 而且总和是不正确的。 The dates are also going till march and in my data set I have dates of only January. 日期也要到三月为止,在我的数据集中,日期只有一月。 Can someone guide me where are possible problems in my code ? 有人可以指导我代码中可能出现问题的地方吗? Thanks 谢谢
I was not giving any date format.For example 我没有给出任何日期格式,例如
df['Date'] = pd.to_datetime(df['Date']) # Not correct
df['Date'] = pd.to_datetime(df['Date'], format='%d.%m.%Y') # Correct way
So because of this it was considering months as days and vice versa which was the reason I was getting wrong results. 因此,因此将几个月视为几天,反之亦然,这就是我得到错误结果的原因。
I got stuck on this same problem and came across this answer. 我陷入了同样的问题,并遇到了这个答案。 After looking through the pandas documentation I learnt a more flexible way to solve this is to allow pandas to infer the 'datetime' format as follow; 浏览完熊猫文档后,我了解到一种更灵活的解决方法,就是允许熊猫推断出“ datetime”格式,如下所示: df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.