[英]How to use the sum values from a column in a multi-level indexed pandas dataframe as a condition for values in new column
我有一個多級索引熊貓數據框。 我想創建一個新列,其中此列中的值基於條件。 此條件基於對該索引的另一列求和,然后將其減半。 如果這小於存儲在單獨列表中的最后一個值,則新列中的值與數據框中另一列的值相同。 如果不滿足此條件,則新列中的所有值都應為0
。
使用此問題嘗試在多索引數據幀中按級別實現此Sum 列,我使用了np.where
和df.sum(level=0, axis=1)
但這會導致以下錯誤:
ValueError: operands could not be broadcast together with shapes (2,8) (21,) ()
這是我的數據框和我迄今為止使用的代碼的示例:
import pandas as pd
import numpy as np
balance = [1400]
data = {'EVENT_ID': [112335580,112335580,112335580,112335580,112335580,112335580,112335580,112335580, 112335582,
112335582,112335582,112335582,112335582,112335582,112335582,112335582,112335582,112335582,
112335582,112335582,112335582],
'SELECTION_ID': [6356576,2554439,2503211,6297034,4233251,2522967,5284417,7660920,8112876,7546023,8175276,8145908,
8175274,7300754,8065540,8175275,8106158,8086265,2291406,8065533,8125015],
'Pot_Bet': [3.236731,2.416966,2.278365,2.264023,2.225353,2.174407, 2.141420,2.122386,2.832997,2.411094,
2.167218,2.138972,2.132137,2.128341,2.116338,2.115239,2.115123,2.114284362,2.113420,
2.113186,2.112729],
'Liability':[3.236731, 2.416966, 12.245492, 12.795112, 15.079176, 23.336171, 50.741182, 571.003118, 2.832997, 6.691736, 15.808607, 27.935834, 35.954927, 43.275250, 147.165537, 193.017915, 199.622454, 265.809019, 405.808678, 473.926781, 706.332594]}
df = pd.DataFrame(data, columns=['EVENT_ID', 'SELECTION_ID', 'Pot_Bet','WIN_LOSE'])
df.set_index(['EVENT_ID', 'SELECTION_ID'], inplace=True) #Selecting columns for indexing
df['Bet'] = np.where(df.sum(level = 0) > 0.5*balance[-1], df['Pot_Bet'], 0)
這會導致前面提到的錯誤。
對於索引112335580
,新列應具有與'Pot_Bet'
相同的值。 而對於索引112335582
,新列的值應為0
。
干杯,桑迪
問題是如果使用df.sum(level=0)
它就像df.groupby(level = 0).sum()
- 按MultiIndex
的第一級MultiIndex
。
解決方案是將GroupBy.transform
用於與原始DataFrame
相同大小的Series
:
df['Bet'] = np.where(df.groupby(level = 0)['Pot_Bet'].transform('sum') > 0.5*balance[-1],
df['Pot_Bet'], 0)
詳情:
print (df.groupby(level = 0)['Pot_Bet'].transform('sum'))
EVENT_ID SELECTION_ID
112335580 6356576 18.859651
2554439 18.859651
2503211 18.859651
6297034 18.859651
4233251 18.859651
2522967 18.859651
5284417 18.859651
7660920 18.859651
112335582 8112876 28.611078
7546023 28.611078
8175276 28.611078
8145908 28.611078
8175274 28.611078
7300754 28.611078
8065540 28.611078
8175275 28.611078
8106158 28.611078
8086265 28.611078
2291406 28.611078
8065533 28.611078
8125015 28.611078
Name: Pot_Bet, dtype: float64
如果只需要使用磨練列,則可以按列名稱為Series
選擇它:
print (df['Pot_Bet'].sum(level=0))
EVENT_ID
112335580 18.859651
112335582 28.611078
Name: Pot_Bet, dtype: float64
print (df.groupby(level = 0)['Pot_Bet'].sum())
EVENT_ID
112335580 18.859651
112335582 28.611078
Name: Pot_Bet, dtype: float64
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.