简体   繁体   English

使用 Dplyr group_by 并使用 pandas 进行汇总

[英]Using Dplyr group_by and summarize with pandas

I am trying to create a separate pandas DataFrame in python using pandas'.groupby function.我正在尝试在 python 中使用 pandas'.groupby ZC1C425268E68385D1ABZ5079 创建一个单独的 pandas DataFrame。 I am working with basketball data and want to create a column that displays if the home and away teams are on the tail end of a back-to-back.我正在处理篮球数据,并希望创建一个列来显示主队和客队是否处于背靠背的尾端。

The 0 in the yesterday_home_team and yesterday_away_team columns indicates that the away team did not play the previous night. yesterday_home_teamyesterday_away_team列中的 0 表示客队前一天晚上没有比赛。

Given that there are multiple games each night, the.groupby function should be used.鉴于每晚有多个游戏,应该使用.groupby function。

Input Data:输入数据:

date     home_team    away_team    
9/22/22  LAL          DET          
9/23/22  LAC          LAL         

Desired output:所需的 output:

date     home_team    away_team    yesterday_home_team    yesterday_away_team
9/21/22  LAL          MIN          0                      MIN 
9/22/22  LAL          DET          DET                    0
9/23/22  LAC          LAL          LAL                    LAC

Appreciate your assistance.感谢您的帮助。

Your output example doesn't make sense to me.您的 output 示例对我来说没有意义。 Do you need the team names in the 'yesterday_home_team' and 'yesterday_away_team' ?您需要'yesterday_home_team''yesterday_away_team'中的球队名称吗? Is it sufficient to simply just have a 1 if the home team is on the back to back, and 0 if the home team is not (and then also same logic for away team)?如果主队背靠背,则仅使用1就足够了,如果主队没有,则仅使用0就足够了(对于客队也是同样的逻辑)? It's also tough when you don't provide a good sample dataset.当您不提供良好的样本数据集时,这也很困难。

Anyways, here's my solution that just indicates a 1 or 0 if the given team is on the back end of the back to back:无论如何,这是我的解决方案,如果给定的团队位于背靠背的后端,则仅指示 1 或 0:

import pandas as pd
import numpy as np
months = ['October', 'November', 'December', 'January', 'February', 'March', 'April', 'May', 'June']

dfs = []
for month in months:
    month = month.lower()   
    url = f'https://www.basketball-reference.com/leagues/NBA_2022_games-{month}.html'
    df = pd.read_html(url)[0]
    df['Date'] = pd.to_datetime(df['Date'])
    dfs.append(df)
    
df = pd.concat(dfs)
df = df.rename(columns={'Visitor/Neutral':'away_team', 'Home/Neutral':'home_team'})

df_melt = pd.melt(df, id_vars=['Date'], 
        value_vars=['away_team', 'home_team'],
        var_name = 'Home_Away',
        value_name = 'Team')


df_melt = df_melt.sort_values('Date').reset_index(drop=True)
df_melt['days_between'] = df_melt.groupby('Team')['Date'].diff().dt.days
df_melt['yesterday'] = np.where(df_melt['days_between'] == 1, 1, 0)
df_melt = df_melt.drop(['days_between', 'Home_Away'], axis=1)


df = df.merge(df_melt.rename(columns={'Team':'home_team', 'yesterday':'yesterday_home_team'}), how='left', left_on=['Date', 'home_team'], right_on=['Date', 'home_team'])
df = df.merge(df_melt.rename(columns={'Team':'away_team', 'yesterday':'yesterday_away_team'}), how='left', left_on=['Date', 'away_team'], right_on=['Date', 'away_team'])


df = df[['Date', 'home_team', 'away_team', 'yesterday_home_team', 'yesterday_away_team']]

Output: Output:

print(df.head(30).to_string())
         Date               home_team              away_team  yesterday_home_team  yesterday_away_team
0  2021-10-19         Milwaukee Bucks          Brooklyn Nets                    0                    0
1  2021-10-19      Los Angeles Lakers  Golden State Warriors                    0                    0
2  2021-10-20       Charlotte Hornets         Indiana Pacers                    0                    0
3  2021-10-20         Detroit Pistons          Chicago Bulls                    0                    0
4  2021-10-20         New York Knicks         Boston Celtics                    0                    0
5  2021-10-20         Toronto Raptors     Washington Wizards                    0                    0
6  2021-10-20       Memphis Grizzlies    Cleveland Cavaliers                    0                    0
7  2021-10-20  Minnesota Timberwolves        Houston Rockets                    0                    0
8  2021-10-20    New Orleans Pelicans     Philadelphia 76ers                    0                    0
9  2021-10-20       San Antonio Spurs          Orlando Magic                    0                    0
10 2021-10-20               Utah Jazz  Oklahoma City Thunder                    0                    0
11 2021-10-20  Portland Trail Blazers       Sacramento Kings                    0                    0
12 2021-10-20            Phoenix Suns         Denver Nuggets                    0                    0
13 2021-10-21           Atlanta Hawks       Dallas Mavericks                    0                    0
14 2021-10-21              Miami Heat        Milwaukee Bucks                    0                    0
15 2021-10-21   Golden State Warriors   Los Angeles Clippers                    0                    0
16 2021-10-22           Orlando Magic        New York Knicks                    0                    0
17 2021-10-22      Washington Wizards         Indiana Pacers                    0                    0
18 2021-10-22     Cleveland Cavaliers      Charlotte Hornets                    0                    0
19 2021-10-22          Boston Celtics        Toronto Raptors                    0                    0
20 2021-10-22      Philadelphia 76ers          Brooklyn Nets                    0                    0
21 2021-10-22         Houston Rockets  Oklahoma City Thunder                    0                    0
22 2021-10-22           Chicago Bulls   New Orleans Pelicans                    0                    0
23 2021-10-22          Denver Nuggets      San Antonio Spurs                    0                    0
24 2021-10-22      Los Angeles Lakers           Phoenix Suns                    0                    0
25 2021-10-22        Sacramento Kings              Utah Jazz                    0                    0
26 2021-10-23     Cleveland Cavaliers          Atlanta Hawks                    1                    0
27 2021-10-23          Indiana Pacers             Miami Heat                    1                    0
28 2021-10-23         Toronto Raptors       Dallas Mavericks                    1                    0
29 2021-10-23           Chicago Bulls        Detroit Pistons                    1                    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM