繁体   English   中英

Pandas 按不同列中的相同值分组

[英]Pandas groupby by the same value in different columns

我有一个 dataframe 是这样的:

                 HOME_TEAM     AWAY_TEAM      BOOL
1153     Manchester United  Swansea City      True             
1163        Leicester City       Everton     False            
1172               Everton     Hull City      True        
1183            Stoke City       Everton      True         
1193  West Bromwich Albion    Sunderland     False 

    

我想要出现在 HOME_TEAMAWAY_TEAM 中的每个团队的组。 例如,对于埃弗顿,我想要类似的结果:

                 HOME_TEAM     AWAY_TEAM             BOOL            
1163        Leicester City       Everton            False            
1172               Everton     Hull City             True        
1183            Stoke City       Everton             True      

然后我要数连续的True of False,但这不是问题。 问题是以这种方式对匹配进行分组。

我知道我可以简单地使用

(df.HOME_TEAM == 'Everton') | (df.AWAY_TEAM == 'Everton)

但是这样我应该为我的 dataframe 中的每个团队使用一个 for 循环,这对于我的大 dataframe 来说太慢了。

您可以使用以下内容:

>>> (df.filter(like='TEAM').stack()
       .reset_index(level=1, drop=True).to_frame('teams')
       .join(df).set_index('teams', append=True
     ).swaplevel().sort_index()
                                      HOME_TEAM     AWAY_TEAM   BOOL
teams                                                               
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
                     1183            Stoke-City       Everton   True
Hull-City            1172               Everton     Hull-City   True
Leicester-City       1163        Leicester-City       Everton  False
Manchester-United    1153     Manchester-United  Swansea-City   True
Stoke-City           1183            Stoke-City       Everton   True
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
Swansea-City         1153     Manchester-United  Swansea-City   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False

或者对于groupby

>>> group = (df.filter(like='TEAM').stack()
           .reset_index(level=1, drop=True).to_frame('teams')
           .join(df).groupby('teams'))
>>> group.get_group('Everton')
        teams       HOME_TEAM  AWAY_TEAM   BOOL
1163  Everton  Leicester-City    Everton  False
1172  Everton         Everton  Hull-City   True
1183  Everton      Stoke-City    Everton   True

这个怎么运作

>>> df.filter(like='TEAM')
 
                 HOME_TEAM     AWAY_TEAM
1153     Manchester-United  Swansea-City
1163        Leicester-City       Everton
1172               Everton     Hull-City
1183            Stoke-City       Everton
1193  West-Bromwich-Albion    Sunderland
>>> _.stack()
1153  HOME_TEAM       Manchester-United
      AWAY_TEAM            Swansea-City
1163  HOME_TEAM          Leicester-City
      AWAY_TEAM                 Everton
1172  HOME_TEAM                 Everton
      AWAY_TEAM               Hull-City
1183  HOME_TEAM              Stoke-City
      AWAY_TEAM                 Everton
1193  HOME_TEAM    West-Bromwich-Albion
      AWAY_TEAM              Sunderland
>>> _.reset_index(level=1, drop=True)
 
1153       Manchester-United
1153            Swansea-City
1163          Leicester-City
1163                 Everton
1172                 Everton
1172               Hull-City
1183              Stoke-City
1183                 Everton
1193    West-Bromwich-Albion
1193              Sunderland

>>> _.to_frame('teams') 
                     teams
1153     Manchester-United
1153          Swansea-City
1163        Leicester-City
1163               Everton
1172               Everton
1172             Hull-City
1183            Stoke-City
1183               Everton
1193  West-Bromwich-Albion
1193            Sunderland

>>> _.join(df)
                     teams             HOME_TEAM     AWAY_TEAM   BOOL
1153     Manchester-United     Manchester-United  Swansea-City   True
1153          Swansea-City     Manchester-United  Swansea-City   True
1163        Leicester-City        Leicester-City       Everton  False
1163               Everton        Leicester-City       Everton  False
1172               Everton               Everton     Hull-City   True
1172             Hull-City               Everton     Hull-City   True
1183            Stoke-City            Stoke-City       Everton   True
1183               Everton            Stoke-City       Everton   True
1193  West-Bromwich-Albion  West-Bromwich-Albion    Sunderland  False
1193            Sunderland  West-Bromwich-Albion    Sunderland  False

>>> _.set_index('teams', append=True)
                                      HOME_TEAM     AWAY_TEAM   BOOL
     teams                                                          
1153 Manchester-United        Manchester-United  Swansea-City   True
     Swansea-City             Manchester-United  Swansea-City   True
1163 Leicester-City              Leicester-City       Everton  False
     Everton                     Leicester-City       Everton  False
1172 Everton                            Everton     Hull-City   True
     Hull-City                          Everton     Hull-City   True
1183 Stoke-City                      Stoke-City       Everton   True
     Everton                         Stoke-City       Everton   True
1193 West-Bromwich-Albion  West-Bromwich-Albion    Sunderland  False
     Sunderland            West-Bromwich-Albion    Sunderland  False
>>> _.swaplevel()

                                      HOME_TEAM     AWAY_TEAM   BOOL
teams                                                               
Manchester-United    1153     Manchester-United  Swansea-City   True
Swansea-City         1153     Manchester-United  Swansea-City   True
Leicester-City       1163        Leicester-City       Everton  False
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
Hull-City            1172               Everton     Hull-City   True
Stoke-City           1183            Stoke-City       Everton   True
Everton              1183            Stoke-City       Everton   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
>>> _.sort_index()
                                      HOME_TEAM     AWAY_TEAM   BOOL
teams                                                               
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
                     1183            Stoke-City       Everton   True
Hull-City            1172               Everton     Hull-City   True
Leicester-City       1163        Leicester-City       Everton  False
Manchester-United    1153     Manchester-United  Swansea-City   True
Stoke-City           1183            Stoke-City       Everton   True
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
Swansea-City         1153     Manchester-United  Swansea-City   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False

或者

>>> df.append([df]).set_index(
        df.filter(like='TEAM').melt().value, 
        drop=False, append=True).swaplevel().sort_index()

                                      HOME_TEAM     AWAY_TEAM   BOOL
value                                                               
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
                     1183            Stoke-City       Everton   True
Hull-City            1172               Everton     Hull-City   True
Leicester-City       1163        Leicester-City       Everton  False
Manchester-United    1153     Manchester-United  Swansea-City   True
Stoke-City           1183            Stoke-City       Everton   True
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
Swansea-City         1153     Manchester-United  Swansea-City   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False

>>> df.append([df]).set_index(df.filter(like='TEAM').melt().value, 
       drop=False, append=True).swaplevel().groupby(level=0).get_group('Everton') 
                   HOME_TEAM  AWAY_TEAM   BOOL
value                                         
Everton 1163  Leicester-City    Everton  False
        1172         Everton  Hull-City   True
        1183      Stoke-City    Everton   True

例如,您可以这样做:

pd.concat(
pd.DataFrame(df[['HOME_TEAM','AWAY_TEAM']].\
             values.\
             ravel()
             ).drop_duplicates()[0].\
             apply(lambda team:
                   df[df['HOME_TEAM'].str.contains(team) | df['AWAY_TEAM'].str.contains(team)]
                   ).to_list())

如果我正确理解了这个问题,我假设可能会提示用户选择一个团队(我推断这是因为您已经说过为每个团队制作循环)。 无论如何,如果是这种情况,您可以尝试使用这个:

select_Team = input("Which team do you like to see: ")
df2 = df[(df["HOME_TEAM"] == select_Team) | (df["AWAY_TEAM"] == select_Team)]
print(df2)

概述:将两列唯一值合并到一个唯一列表中,然后找到结果项在 A 列或 B 列中的位置,然后设置团队值。

 Index=[1153,1163,1172,1183,1193]
 HOME_TEAM=['Manchester United','Leicester City','Everton','Stoke City','West Bromwich Albion']
 AWAY_TEAM=['Swansea City','Everton','Hull City','Everton','Sunderland']
 BOOL=[True,False,True,True,False]

 types = pd.Series(data=['int', 'str', 'str', 'bool'], index=['Index', 'HOME_TEAM', 'AWAY_TEAM', 'BOOL'])
 types = types.apply(eval)



 df=pd.DataFrame({'Index':Index, 'HOME_TEAM':HOME_TEAM, 'AWAY_TEAM':AWAY_TEAM, 'BOOL':BOOL})
 df.set_index('Index')

 df = df.astype(dtype=types.to_dict())

 print(df.dtypes)

 list1=list(df['HOME_TEAM'].unique())
 list2=list(df['AWAY_TEAM'].unique())
 notInList1=set(list1)-set(list2)
 combined=(list1+list(notInList1))

 df['TEAM']=""
 for item in combined:
    filter=(df['HOME_TEAM']==item) | (df['AWAY_TEAM']==item)
    key=df[filter]['Index'].index
    df.loc[key,'TEAM']=item

 print(df.head())

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM