[英]Python Pandas Dataframe Remove Rows by Timedelta Column Value
[英]python pandas timedelta specific rows
我擁有一個賽季的籃球得分值的數據框,並且我想找到每個球隊本賽季每場比賽的天數。
示例框架:
testDateFrame = pd.DataFrame({'HomeTeam': ['HOU', 'CHI', 'DAL', 'HOU'],
'AwayTeam' : ['CHI', 'DAL', 'CHI', 'DAL'],
'HomeGameNum': [1, 2, 2, 2],
'AwayGameNum' : [1, 1, 3, 3],
'Date' : [datetime.date(2014,3,11), datetime.date(2014,3,12), datetime.date(2014,3,14), datetime.date(2014,3,15)]})
我想要的輸出是這樣的:
AwayGameNum AwayTeam Date HomeGameNum HomeTeam AwayRest HomeRest
1 CHI 2014-03-11 1 HOU nan nan
1 DAL 2014-03-12 2 CHI nan 0
3 CHI 2014-03-14 2 DAL 1 1
3 DAL 2014-03-15 2 HOU 0 3
其中AwayRest,HomeRest列是AwayTeam,HomeTeam -1的游戲間隔天數
我會稍微調整一下數據布局,使其與Hadley Wickhams對Tidy Data的定義相符 。 這使計算更加簡單。 消除AwayTeam
和HomeTeam
的列,並與Team
組成一個列。 然后創建一個布爾列( HomeTeam
),以確定該團隊是否為主隊。
注意:我沒有更改AwayGameNum
和HomeGameNum
,因此數字與您所需的輸出不匹配。 但是該方法將起作用。
In [34]: df
Out[34]:
AwayGameNum Team Date HomeGameNum HomeTeam
0 1 CHI 2014-03-11 1 False
1 1 HOU 2014-03-11 1 True
2 1 DAL 2014-03-12 2 False
3 1 CHI 2014-03-12 2 True
4 3 CHI 2014-03-14 2 False
5 3 DAL 2014-03-14 2 True
6 3 DAL 2014-03-15 2 False
7 3 HOU 2014-03-15 2 True
[8 rows x 5 columns]
In [62]: rest = df.groupby(['Team'])['Date'].diff() - datetime.timedelta(1)
In [63]: df['HomeRest'] = rest[df.HomeTeam]
In [64]: df['AwayRest'] = rest[~df.HomeTeam]
In [65]: df
Out[65]:
AwayGameNum Team Date HomeGameNum HomeTeam HomeRest AwayRest
0 1 CHI 2014-03-11 1 False NaT NaT
1 1 HOU 2014-03-11 1 True NaT NaT
2 1 DAL 2014-03-12 2 False NaT NaT
3 1 CHI 2014-03-12 2 True 0 days NaT
4 3 CHI 2014-03-14 2 False NaT 1 days
5 3 DAL 2014-03-14 2 True 1 days NaT
6 3 DAL 2014-03-15 2 False NaT 0 days
7 3 HOU 2014-03-15 2 True 3 days NaT
[8 rows x 7 columns]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.