![](/img/trans.png)
[英]How to go through a pandas data frame and only keep rows that have the same value throughout the entire row?
[英]Is there a way to go through rows in a pandas data frame more efficiently?
我有一個龐大的熊貓數據框,其中每一行對應一個體育比賽。 它看起來如下:
**編輯:我將更改示例代碼以更好地反映實際數據:這讓我意識到“丟失”或“贏”之外的值的存在使得這更加困難。
d = {'date': ['21.01.96', '22.02.96', '23.02.96', '24.02.96', '25.02.96',
'26.02.96', '27.02.96', '28.02.96', '29.02.96', '30.02.96'],
'challenger': [5, 5, 10, 5, 4, 5, 8, 8, 10, 8],
'opponent': [2, 4, 5, 4, 5, 10, 5, 2, 4, 10],
'outcome': ['win', 'lost', 'declined', 'win', 'declined', 'win', 'declined', 'declined', 'lost', 'lost']
}
df = pd.DataFrame(data=d)
對於每場比賽,我想計算一個新變量的先前贏/輸。 在示例情況下,'prev_wins'變量將是[0,0,0,1,0,0,0,0,0,0]。 我確實設法為此創建了工作代碼,如下所示:
data['prev_wins_spec_challenger'] = 0
data['prev_losses_spec_challenger'] = 0
data['challenger'] = data['challenger'].astype(str)
data['opponent'] = data['opponent'].astype(str)
data['matchups'] = data['challenger'] + '-' + data['opponent']
# create list of matchups with unique pairings
matchups_temp = list(data['matchups'].unique())
matchups = []
for match in matchups_temp:
if match[::-1] in matchups:
pass
else:
matchups.append(match)
prev_wins = {}
for i in matchups:
prev_wins[i] = 0
prev_losses = {}
for i in matchups:
prev_losses[i] = 0
# go through data set for each matchup and calculate variables
for i in range(0, len(matchups)):
match = matchups[i].split('-')
challenger = match[0]
opponent = match[1]
for index, row in data.iterrows():
if row['challenger'] == challenger and row['opponent'] == opponent:
if row['outcome'] == 'won':
data['prev_wins_spec_challenger'][index] = prev_wins[matchups[i]]
prev_wins[matchups[i]] += 1
elif row['outcome'] == 'lost':
data['prev_losses_spec_challenger'][index] = prev_losses[matchups[i]]
prev_losses[matchups[i]] += 1
elif row['challenger'] == opponent and row['opponent'] == challenger:
if row['outcome'] == 'won':
data['prev_losses_spec_challenger'][index] = prev_losses[matchups[i]]
prev_losses[matchups[i]] += 1
elif row['outcome'] == 'lost':
data['prev_wins_spec_challenger'][index] = prev_wins[matchups[i]]
prev_wins[matchups[i]] += 1
這個問題是它需要非常長的時間,因為總共有大約65,000個不同的比賽,而且數據幀有~170.000行。 在我的筆記本電腦上運行大約需要180個小時,這是不可接受的。
我相信有一個更好的解決方案,但即使在搜索互聯網一整天后我都找不到。 如何讓這段代碼更快?
IIUC,你可以做這樣的事情,使用shift()
來查看先前的結果,並獲得它等於win
的布爾值的累積和:
data['previous_wins'] = data.groupby('challenger').outcome.transform(lambda x: x.shift().eq('win').cumsum())
>>> data
challenger date opponent outcome previous_wins
0 5 21.01.96 6 win 0
1 4 22.02.96 3 loss 0
2 5 23.02.96 6 win 1
如果你想要計算一個挑戰者對特定對手的勝利數量,你可以將挑戰者和對手分組:
data['previous_wins'] = data.groupby(['opponent','challenger']).outcome.transform(lambda x: x.shift().eq('win').cumsum())
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.