简体   繁体   English

Pandas 按不同列中的相同值分组

[英]Pandas groupby by the same value in different columns

I have a dataframe as this:我有一个 dataframe 是这样的:

                 HOME_TEAM     AWAY_TEAM      BOOL
1153     Manchester United  Swansea City      True             
1163        Leicester City       Everton     False            
1172               Everton     Hull City      True        
1183            Stoke City       Everton      True         
1193  West Bromwich Albion    Sunderland     False 

    

I want groups for each team that appears in HOME_TEAM or AWAY_TEAM.我想要出现在 HOME_TEAMAWAY_TEAM 中的每个团队的组。 For example, for Everton, I would like something similar as a result:例如,对于埃弗顿,我想要类似的结果:

                 HOME_TEAM     AWAY_TEAM             BOOL            
1163        Leicester City       Everton            False            
1172               Everton     Hull City             True        
1183            Stoke City       Everton             True      

then I have to count the consecutive True of False, but this is not a problem.然后我要数连续的True of False,但这不是问题。 The problem is groups the matches in this way.问题是以这种方式对匹配进行分组。

I know I can simply use我知道我可以简单地使用

(df.HOME_TEAM == 'Everton') | (df.AWAY_TEAM == 'Everton)

but this way I should use a for loop for each team in my dataframe and it is too slow for my big dataframe.但是这样我应该为我的 dataframe 中的每个团队使用一个 for 循环,这对于我的大 dataframe 来说太慢了。

You can use the following:您可以使用以下内容:

>>> (df.filter(like='TEAM').stack()
       .reset_index(level=1, drop=True).to_frame('teams')
       .join(df).set_index('teams', append=True
     ).swaplevel().sort_index()
                                      HOME_TEAM     AWAY_TEAM   BOOL
teams                                                               
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
                     1183            Stoke-City       Everton   True
Hull-City            1172               Everton     Hull-City   True
Leicester-City       1163        Leicester-City       Everton  False
Manchester-United    1153     Manchester-United  Swansea-City   True
Stoke-City           1183            Stoke-City       Everton   True
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
Swansea-City         1153     Manchester-United  Swansea-City   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False

Or for groupby :或者对于groupby

>>> group = (df.filter(like='TEAM').stack()
           .reset_index(level=1, drop=True).to_frame('teams')
           .join(df).groupby('teams'))
>>> group.get_group('Everton')
        teams       HOME_TEAM  AWAY_TEAM   BOOL
1163  Everton  Leicester-City    Everton  False
1172  Everton         Everton  Hull-City   True
1183  Everton      Stoke-City    Everton   True

HOW IT WORKS这个怎么运作

>>> df.filter(like='TEAM')
 
                 HOME_TEAM     AWAY_TEAM
1153     Manchester-United  Swansea-City
1163        Leicester-City       Everton
1172               Everton     Hull-City
1183            Stoke-City       Everton
1193  West-Bromwich-Albion    Sunderland
>>> _.stack()
1153  HOME_TEAM       Manchester-United
      AWAY_TEAM            Swansea-City
1163  HOME_TEAM          Leicester-City
      AWAY_TEAM                 Everton
1172  HOME_TEAM                 Everton
      AWAY_TEAM               Hull-City
1183  HOME_TEAM              Stoke-City
      AWAY_TEAM                 Everton
1193  HOME_TEAM    West-Bromwich-Albion
      AWAY_TEAM              Sunderland
>>> _.reset_index(level=1, drop=True)
 
1153       Manchester-United
1153            Swansea-City
1163          Leicester-City
1163                 Everton
1172                 Everton
1172               Hull-City
1183              Stoke-City
1183                 Everton
1193    West-Bromwich-Albion
1193              Sunderland

>>> _.to_frame('teams') 
                     teams
1153     Manchester-United
1153          Swansea-City
1163        Leicester-City
1163               Everton
1172               Everton
1172             Hull-City
1183            Stoke-City
1183               Everton
1193  West-Bromwich-Albion
1193            Sunderland

>>> _.join(df)
                     teams             HOME_TEAM     AWAY_TEAM   BOOL
1153     Manchester-United     Manchester-United  Swansea-City   True
1153          Swansea-City     Manchester-United  Swansea-City   True
1163        Leicester-City        Leicester-City       Everton  False
1163               Everton        Leicester-City       Everton  False
1172               Everton               Everton     Hull-City   True
1172             Hull-City               Everton     Hull-City   True
1183            Stoke-City            Stoke-City       Everton   True
1183               Everton            Stoke-City       Everton   True
1193  West-Bromwich-Albion  West-Bromwich-Albion    Sunderland  False
1193            Sunderland  West-Bromwich-Albion    Sunderland  False

>>> _.set_index('teams', append=True)
                                      HOME_TEAM     AWAY_TEAM   BOOL
     teams                                                          
1153 Manchester-United        Manchester-United  Swansea-City   True
     Swansea-City             Manchester-United  Swansea-City   True
1163 Leicester-City              Leicester-City       Everton  False
     Everton                     Leicester-City       Everton  False
1172 Everton                            Everton     Hull-City   True
     Hull-City                          Everton     Hull-City   True
1183 Stoke-City                      Stoke-City       Everton   True
     Everton                         Stoke-City       Everton   True
1193 West-Bromwich-Albion  West-Bromwich-Albion    Sunderland  False
     Sunderland            West-Bromwich-Albion    Sunderland  False
>>> _.swaplevel()

                                      HOME_TEAM     AWAY_TEAM   BOOL
teams                                                               
Manchester-United    1153     Manchester-United  Swansea-City   True
Swansea-City         1153     Manchester-United  Swansea-City   True
Leicester-City       1163        Leicester-City       Everton  False
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
Hull-City            1172               Everton     Hull-City   True
Stoke-City           1183            Stoke-City       Everton   True
Everton              1183            Stoke-City       Everton   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
>>> _.sort_index()
                                      HOME_TEAM     AWAY_TEAM   BOOL
teams                                                               
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
                     1183            Stoke-City       Everton   True
Hull-City            1172               Everton     Hull-City   True
Leicester-City       1163        Leicester-City       Everton  False
Manchester-United    1153     Manchester-United  Swansea-City   True
Stoke-City           1183            Stoke-City       Everton   True
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
Swansea-City         1153     Manchester-United  Swansea-City   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False

ALTERNATIVELY或者

>>> df.append([df]).set_index(
        df.filter(like='TEAM').melt().value, 
        drop=False, append=True).swaplevel().sort_index()

                                      HOME_TEAM     AWAY_TEAM   BOOL
value                                                               
Everton              1163        Leicester-City       Everton  False
                     1172               Everton     Hull-City   True
                     1183            Stoke-City       Everton   True
Hull-City            1172               Everton     Hull-City   True
Leicester-City       1163        Leicester-City       Everton  False
Manchester-United    1153     Manchester-United  Swansea-City   True
Stoke-City           1183            Stoke-City       Everton   True
Sunderland           1193  West-Bromwich-Albion    Sunderland  False
Swansea-City         1153     Manchester-United  Swansea-City   True
West-Bromwich-Albion 1193  West-Bromwich-Albion    Sunderland  False

>>> df.append([df]).set_index(df.filter(like='TEAM').melt().value, 
       drop=False, append=True).swaplevel().groupby(level=0).get_group('Everton') 
                   HOME_TEAM  AWAY_TEAM   BOOL
value                                         
Everton 1163  Leicester-City    Everton  False
        1172         Everton  Hull-City   True
        1183      Stoke-City    Everton   True

You can do, for example:例如,您可以这样做:

pd.concat(
pd.DataFrame(df[['HOME_TEAM','AWAY_TEAM']].\
             values.\
             ravel()
             ).drop_duplicates()[0].\
             apply(lambda team:
                   df[df['HOME_TEAM'].str.contains(team) | df['AWAY_TEAM'].str.contains(team)]
                   ).to_list())

If I understand the question correctly, I am assuming that the user may be prompted to choose a team (I inferred this because you've said something about making loops for each team).如果我正确理解了这个问题,我假设可能会提示用户选择一个团队(我推断这是因为您已经说过为每个团队制作循环)。 Anyway, if that is the case, you may try using this:无论如何,如果是这种情况,您可以尝试使用这个:

select_Team = input("Which team do you like to see: ")
df2 = df[(df["HOME_TEAM"] == select_Team) | (df["AWAY_TEAM"] == select_Team)]
print(df2)

Overview: combine two columns unique values into one unique list then find where the resulting items are in either column A or column B then set the team value.概述:将两列唯一值合并到一个唯一列表中,然后找到结果项在 A 列或 B 列中的位置,然后设置团队值。

 Index=[1153,1163,1172,1183,1193]
 HOME_TEAM=['Manchester United','Leicester City','Everton','Stoke City','West Bromwich Albion']
 AWAY_TEAM=['Swansea City','Everton','Hull City','Everton','Sunderland']
 BOOL=[True,False,True,True,False]

 types = pd.Series(data=['int', 'str', 'str', 'bool'], index=['Index', 'HOME_TEAM', 'AWAY_TEAM', 'BOOL'])
 types = types.apply(eval)



 df=pd.DataFrame({'Index':Index, 'HOME_TEAM':HOME_TEAM, 'AWAY_TEAM':AWAY_TEAM, 'BOOL':BOOL})
 df.set_index('Index')

 df = df.astype(dtype=types.to_dict())

 print(df.dtypes)

 list1=list(df['HOME_TEAM'].unique())
 list2=list(df['AWAY_TEAM'].unique())
 notInList1=set(list1)-set(list2)
 combined=(list1+list(notInList1))

 df['TEAM']=""
 for item in combined:
    filter=(df['HOME_TEAM']==item) | (df['AWAY_TEAM']==item)
    key=df[filter]['Index'].index
    df.loc[key,'TEAM']=item

 print(df.head())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM