簡體   English   中英

如何使用 Pandas 數據框計算月初至今 (MTD) 和 YTD?

[英]How to calculate Month to Date (MTD) and YTD using Pandas dataframe?

我想使用pandas數據框計算MTDYTD 為此,我寫了一段代碼,但出現了以下錯誤。

代碼:

import pandas as pd

data = {'date' : ['2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30','2017/01/01', '2017/01/02', '2017/01/03', '2017/01/04', '2017/01/15', '2017/01/20', '2017/01/23', '2017/01/30', '2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30','2017/04/01', '2017/04/02', '2017/04/03', '2017/04/04', '2017/04/15', '2017/04/20', '2017/04/23', '2017/04/30', '2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30','2017/05/01', '2017/05/02', '2017/05/03', '2017/05/04', '2017/05/15', '2017/05/20', '2017/05/23', '2017/05/30'],
        'product': ['Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange', 'Apple', 'Apple', 'Apple','Apple', 'Apple', 'Apple','Apple', 'Apple', 'Orange', 'Orange', 'Orange','Orange', 'Orange', 'Orange','Orange', 'Orange'],
        'price': [10, 20, 10, 50, 10, 5, 10, 10, 20, 10, 5, 5, 10, 10, 20, 50, 10, 5, 20, 10, 10, 20, 50, 20, 5, 5, 10, 10, 20, 50, 30, 10, 20, 5, 5, 10, 20, 10, 20, 10, 40, 20, 10, 10, 20, 20, 10, 5]}


df = pd.DataFrame(data)

print("Dataframe-----------------------------------")
print(df)
print("Dataframe Ends------------------------------")

df.date = pd.to_datetime(df.date)
df = df.groupby('date', 'product').price.sum()
df = df.groupby(df.index.to_period('m')).cumsum().reset_index()

print("MTD Dataframe")
print(df)

錯誤:


Traceback (most recent call last):
  File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/generic.py", line 550, in _get_axis_number
return cls._AXIS_TO_AXIS_NUMBER[axis]
 KeyError: 'product'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ab/parry-data_processing/parry-analytics/poc.py", line 15, in <module>
df = df.groupby('date', 'product').price.sum()
  File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/frame.py", line 7713, in groupby
axis = self._get_axis_number(axis)
  File "/home/ab/PycharmProjects/parry-analytics/lib/python3.9/site-packages/pandas/core/generic.py", line 552, in _get_axis_number
raise ValueError(f"No axis named {axis} for object type {cls.__name__}")
ValueError: No axis named product for object type DataFrame

任何人都可以提出解決此問題的解決方案嗎?

預期的 MTD 輸出:

          date product  price
0   2017/01/01   Apple     10
1   2017/01/02   Apple     30
2   2017/01/03   Apple     40
3   2017/01/04   Apple     90
4   2017/01/15   Apple     100
5   2017/01/20   Apple     105
6   2017/01/23   Apple     115
7   2017/01/30   Apple     125
8   2017/01/01  Orange     20
9   2017/01/02  Orange     30
10  2017/01/03  Orange     35
11  2017/01/04  Orange     40
12  2017/01/15  Orange     50
13  2017/01/20  Orange     60
14  2017/01/23  Orange     80
15  2017/01/30  Orange     130
16  2017/04/01   Apple     10
17  2017/04/02   Apple     15
18  2017/04/03   Apple     35
19  2017/04/04   Apple     45
20  2017/04/15   Apple     55
21  2017/04/20   Apple     75
22  2017/04/23   Apple     125
23  2017/04/30   Apple     145
24  2017/04/01  Orange      5
25  2017/04/02  Orange     10
26  2017/04/03  Orange     20
27  2017/04/04  Orange     30
28  2017/04/15  Orange     50
29  2017/04/20  Orange     100
30  2017/04/23  Orange     130
31  2017/04/30  Orange     140
32  2017/05/01   Apple     20
33  2017/05/02   Apple     25
34  2017/05/03   Apple     30
35  2017/05/04   Apple     40
36  2017/05/15   Apple     60
37  2017/05/20   Apple     70
38  2017/05/23   Apple     90
39  2017/05/30   Apple     100
40  2017/05/01  Orange     40
41  2017/05/02  Orange     60
42  2017/05/03  Orange     70
43  2017/05/04  Orange     80
44  2017/05/15  Orange     100
45  2017/05/20  Orange     120
46  2017/05/23  Orange     130
47  2017/05/30  Orange     135

年初至今的預期產出:

同上。 但它應該從財政年度(4 月)和產品方面的開始計算。

采用:

df.date = pd.to_datetime(df.date)
df['MTD'] = df.groupby([df.date.dt.to_period('m'),'product']).price.cumsum()

#df['test'] = df.date.dt.to_period('A-MAR')
df['YTD'] = df.groupby([df.date.dt.to_period('A-MAR'),'product']).price.cumsum()
print(df.tail(20))
         date product  price  MTD  YTD
28 2017-04-15  Orange     20   50   50
29 2017-04-20  Orange     50  100  100
30 2017-04-23  Orange     30  130  130
31 2017-04-30  Orange     10  140  140
32 2017-05-01   Apple     20   20  165
33 2017-05-02   Apple      5   25  170
34 2017-05-03   Apple      5   30  175
35 2017-05-04   Apple     10   40  185
36 2017-05-15   Apple     20   60  205
37 2017-05-20   Apple     10   70  215
38 2017-05-23   Apple     20   90  235
39 2017-05-30   Apple     10  100  245
40 2017-05-01  Orange     40   40  180
41 2017-05-02  Orange     20   60  200
42 2017-05-03  Orange     10   70  210
43 2017-05-04  Orange     10   80  220
44 2017-05-15  Orange     20  100  240
45 2017-05-20  Orange     20  120  260
46 2017-05-23  Orange     10  130  270
47 2017-05-30  Orange      5  135  275

您看到的錯誤可能是由於 groupby 函數的語法問題造成的。

df = df.groupby('date', 'product').price.sum()行中,groupby 函數有兩個參數:'date' 和 'product'。 但是,這些參數應該作為字符串列表傳遞,如下所示:

 df = df.groupby(['date', 'product']).price.sum().

也改線

df = df.groupby(df.index.to_period('m')).cumsum().reset_index() to df['price'] = df['price'].groupby(df.index.to_period('m')).cumsum().reset_index().

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM