[英]Python/Pandas: How do I select rows in one column where value iis equal to a different row in a different column?
Here is a sample of my data: 这是我的数据样本:
In[177]:df_data[['Date', 'TeamName', 'Opponent', 'ScoreOff']].head()
Out[177]:
Date TeamName Opponent ScoreOff
4128 2005-09-08 00:00:00 New England Patriots Oakland Raiders 30
4129 2005-09-08 00:00:00 Oakland Raiders New England Patriots 20
4130 2005-09-11 00:00:00 Arizona Cardinals New York Giants 19
4131 2005-09-11 00:00:00 Baltimore Ravens Indianapolis Colts 7
4132 2005-09-11 00:00:00 Buffalo Bills Houston Texans 22
For each row, I need to set a new column ['OpponentScoreOff'] equal to that team's opponent's ScoreOff on that day. 对于每一行,我需要设置一个新列['OpponentScoreOff'],该列等于该团队当天对手的ScoreOff。
I have done it by basically doing the following, but it's slow and I feel like there is a more pythonic/vectorized way to do it. 我基本上是通过执行以下操作来完成此操作的,但是它很慢,而且我觉得还有更多的pythonic / vectorized方式可以做到。
g1 = df_data.groupby('Date')
for date, teams in g1:
g2 = teams.groupby('TeamName')
for teamname, game in teams:
df_data[(df_data['TeamName'] == teamname) & (dfdata['Date'] == date)]['OppScoreOff'] = df_data[(df_data['Opponent'] == teamname) & (df_data['Date'] == date)]['ScoreOff']
It worked, but it's slow. 它起作用了,但是很慢。 Any better way to do this?
还有更好的方法吗?
You could use sort
to take advantage of the bijection between TeamName and Opponent for any given date. 您可以使用
sort
来利用任何给定日期的TeamName和Opponent之间的双射。 Consider the following: 考虑以下:
import pandas as pd
import numpy as np
df_data = df_data.sort(['Date', 'TeamName'])
opp_score = np.array(df_data.sort(['Date', 'Opponent'])['ScoreOff'])
df_data['OpponentScoreOff'] = opp_score
The array call is necessary to remove the DataFrame indexing. 数组调用对于删除DataFrame索引是必需的。 That way, the array isn't resorted once it's put back into
df_data
. 这样,一旦将数组放回
df_data
,就不会再使用该数组。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.