简体   繁体   English

Pandas DataFrame 根据多个条件分组添加新列值

[英]Pandas DataFrame add new column values based on group by multiple conditions

I have a DataFrame as below我有一个如下所示的 DataFrame

       Color  Month  Quantity
index                        
0          1      1     34047
1          1      2     36654
2          2      3     37291
3          2      4     35270
4          3      5     35407
5          1     12      9300

I want to add a more extra column PrevoiousMonthQty to this Dataframe with the filled values in the Qty column by the logic that we will group by (Color, Month) and the Month is the Previous Month我想添加更多额外的列PrevoiousMonthQty到这个 Dataframe,并根据我们将按(Color, Month)分组的逻辑在Qty列中填充值, MonthPrevious Month

The target DataFrame I expected looks like this我期望的目标 DataFrame 看起来像这样

在此处输入图片说明

Some logic explanation can be seen as一些逻辑解释可以看作

在此处输入图片说明

Any helps would be very much appreciated.任何帮助将不胜感激。

Thank you very much.非常感谢。

Here is a way using Multindex and map after finding the previous month:这是在查找上个月后使用Multindexmap一种方法:

prev_month = pd.to_datetime(df['Month'],format='%m').sub(pd.Timedelta(1,unit='m')).dt.month

m = df.set_index(['Color','Month'])['Quantity']

final = (df.assign(Prev_Month_Value=pd.MultiIndex.from_arrays([df['Color'],prev_month])
                                                          .map(m).fillna(0)))

#To assign into the existing df,use below code instead of df.assign() which returns a copy
#df['Previous Month Value'] = (pd.MultiIndex.from_arrays([df['Color'],prev_month])
#                                                              .map(m).fillna(0)

Output:输出:

       Color  Month  Quantity  Prev_Month_Value
index                                          
0          1      1     34047            9300.0
1          1      2     36654           34047.0
2          2      3     37291               0.0
3          2      4     35270           37291.0
4          3      5     35407               0.0
5          1     12      9300               0.0

Details:细节:

Step1 : Find previous month by converting Month column to datetime and subtract 1 month using pd.Timedelta .步骤 1 :通过将Month列转换为 datetime 并使用pd.Timedelta减去 1 个月来pd.Timedelta

Step2 : Create a multiindex series with Quantity as value and Color and Month as index. Step2 :创建一个以数量为值、 ColorMonth为索引的多索引系列。

Step3 : Create a MultiIndex using Color and prev_month series and map it back as new column (also fill nan with 0)步骤 3 :使用Colorprev_month系列创建一个 MultiIndex 并将其映射回新列(也用 0 填充 nan)

Here is another approach using merge - we'll "merge" on a prv_month key which we'll assign inline:这是使用merge另一种方法 - 我们将在我们将内联assignprv_month键上“合并”:

df['PreviousQty'] = (df.assign(prv_month=df['Month'].sub(1).where(lambda x: x!=0, 12))
                     .merge(df,
                            how='left',
                            left_on=['Color', 'prv_month'],
                            right_on=['Color', 'Month'])['Qty_y'].fillna(0))

[out] [出去]

   Color  Month    Qty  PreviousQty
0      1      1  34047       9300.0
1      1      2  36654      34047.0
2      2      3  37291          0.0
3      2      4  35270      37291.0
4      3      5  35407          0.0
5      1     12   9300          0.0

Use DataFrame.pivot for reshape DataFrame and add full months by DataFrame.reindex :使用DataFrame.pivot重塑 DataFrame 并通过DataFrame.reindex添加整月:

df1 = df.pivot('Color','Month','Oty').reindex(columns=range(1,13))
print (df1)
Month        1        2        3        4        5   6   7   8   9  10  11  \
Color                                                                        
1      34047.0  36654.0      NaN      NaN      NaN NaN NaN NaN NaN NaN NaN   
2          NaN      NaN  37291.0  35270.0      NaN NaN NaN NaN NaN NaN NaN   
3          NaN      NaN      NaN      NaN  35407.0 NaN NaN NaN NaN NaN NaN   

Month      12  
Color          
1      9300.0  
2         NaN  
3         NaN  

Then use numpy.roll with DataFrame.join :然后将numpy.rollDataFrame.join numpy.roll使用:

s = pd.DataFrame(np.roll(df1.to_numpy(), 1, axis=1), 
                 index=df1.index, 
                 columns=df1.columns).stack().rename('Previous Month')

df = df.join(s, on=['Color','Month']).fillna({'Previous Month':0})
print (df)
   Index  Color  Month    Oty  Previous Month
0      0      1      1  34047          9300.0
1      1      1      2  36654         34047.0
2      2      2      3  37291             0.0
3      3      2      4  35270         37291.0
4      4      3      5  35407             0.0
5      5      1     12   9300             0.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python根据另一个数据框中的多个条件添加新的列值 - Python add new column values based on multiple conditions in another dataframe 使用基于多个条件的值将新列添加到 dataframe - Add new column to a dataframe with values based on multiple conditions pandas dataframe 中的新列基于现有列值和条件列表 - New column in pandas dataframe based on existing column values with conditions list 根据熊猫数据框中的多个列值和条件替换值 - Replacing values based on multiple column values and conditions in pandas dataframe 根据多个不同的条件在 pandas 数据框中创建了一个新列 - created a new column in a pandas dataframe based on multiple different conditions 基于多个条件在 Pandas 数据框中创建一个新列 - Create a new column in pandas dataframe based on multiple conditions 将基于多个条件的列的值填充到 dataframe 的新列 - Populating the values of a column based on multiple conditions to a new column of a dataframe "如何根据多个条件估计 Pandas 数据框列值的计数?" - How to estimate count for Pandas dataframe column values based on multiple conditions? Python:根据Python中的多个条件更改pandas DataFrame列中的值 - Python: Change values in a pandas DataFrame column based on multiple conditions in Python 基于布尔条件的 Pandas 数据框中的新列 - New column in Pandas dataframe based on boolean conditions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM