[英]Pandas groupby by the same value in different columns
I have a dataframe as this:我有一个 dataframe 是这样的:
HOME_TEAM AWAY_TEAM BOOL
1153 Manchester United Swansea City True
1163 Leicester City Everton False
1172 Everton Hull City True
1183 Stoke City Everton True
1193 West Bromwich Albion Sunderland False
I want groups for each team that appears in HOME_TEAM or AWAY_TEAM.我想要出现在 HOME_TEAM或AWAY_TEAM 中的每个团队的组。 For example, for Everton, I would like something similar as a result:例如,对于埃弗顿,我想要类似的结果:
HOME_TEAM AWAY_TEAM BOOL
1163 Leicester City Everton False
1172 Everton Hull City True
1183 Stoke City Everton True
then I have to count the consecutive True of False, but this is not a problem.然后我要数连续的True of False,但这不是问题。 The problem is groups the matches in this way.问题是以这种方式对匹配进行分组。
I know I can simply use我知道我可以简单地使用
(df.HOME_TEAM == 'Everton') | (df.AWAY_TEAM == 'Everton)
but this way I should use a for loop for each team in my dataframe and it is too slow for my big dataframe.但是这样我应该为我的 dataframe 中的每个团队使用一个 for 循环,这对于我的大 dataframe 来说太慢了。
You can use the following:您可以使用以下内容:
>>> (df.filter(like='TEAM').stack()
.reset_index(level=1, drop=True).to_frame('teams')
.join(df).set_index('teams', append=True
).swaplevel().sort_index()
HOME_TEAM AWAY_TEAM BOOL
teams
Everton 1163 Leicester-City Everton False
1172 Everton Hull-City True
1183 Stoke-City Everton True
Hull-City 1172 Everton Hull-City True
Leicester-City 1163 Leicester-City Everton False
Manchester-United 1153 Manchester-United Swansea-City True
Stoke-City 1183 Stoke-City Everton True
Sunderland 1193 West-Bromwich-Albion Sunderland False
Swansea-City 1153 Manchester-United Swansea-City True
West-Bromwich-Albion 1193 West-Bromwich-Albion Sunderland False
Or for groupby
:或者对于groupby
:
>>> group = (df.filter(like='TEAM').stack()
.reset_index(level=1, drop=True).to_frame('teams')
.join(df).groupby('teams'))
>>> group.get_group('Everton')
teams HOME_TEAM AWAY_TEAM BOOL
1163 Everton Leicester-City Everton False
1172 Everton Everton Hull-City True
1183 Everton Stoke-City Everton True
HOW IT WORKS这个怎么运作
>>> df.filter(like='TEAM')
HOME_TEAM AWAY_TEAM
1153 Manchester-United Swansea-City
1163 Leicester-City Everton
1172 Everton Hull-City
1183 Stoke-City Everton
1193 West-Bromwich-Albion Sunderland
>>> _.stack()
1153 HOME_TEAM Manchester-United
AWAY_TEAM Swansea-City
1163 HOME_TEAM Leicester-City
AWAY_TEAM Everton
1172 HOME_TEAM Everton
AWAY_TEAM Hull-City
1183 HOME_TEAM Stoke-City
AWAY_TEAM Everton
1193 HOME_TEAM West-Bromwich-Albion
AWAY_TEAM Sunderland
>>> _.reset_index(level=1, drop=True)
1153 Manchester-United
1153 Swansea-City
1163 Leicester-City
1163 Everton
1172 Everton
1172 Hull-City
1183 Stoke-City
1183 Everton
1193 West-Bromwich-Albion
1193 Sunderland
>>> _.to_frame('teams')
teams
1153 Manchester-United
1153 Swansea-City
1163 Leicester-City
1163 Everton
1172 Everton
1172 Hull-City
1183 Stoke-City
1183 Everton
1193 West-Bromwich-Albion
1193 Sunderland
>>> _.join(df)
teams HOME_TEAM AWAY_TEAM BOOL
1153 Manchester-United Manchester-United Swansea-City True
1153 Swansea-City Manchester-United Swansea-City True
1163 Leicester-City Leicester-City Everton False
1163 Everton Leicester-City Everton False
1172 Everton Everton Hull-City True
1172 Hull-City Everton Hull-City True
1183 Stoke-City Stoke-City Everton True
1183 Everton Stoke-City Everton True
1193 West-Bromwich-Albion West-Bromwich-Albion Sunderland False
1193 Sunderland West-Bromwich-Albion Sunderland False
>>> _.set_index('teams', append=True)
HOME_TEAM AWAY_TEAM BOOL
teams
1153 Manchester-United Manchester-United Swansea-City True
Swansea-City Manchester-United Swansea-City True
1163 Leicester-City Leicester-City Everton False
Everton Leicester-City Everton False
1172 Everton Everton Hull-City True
Hull-City Everton Hull-City True
1183 Stoke-City Stoke-City Everton True
Everton Stoke-City Everton True
1193 West-Bromwich-Albion West-Bromwich-Albion Sunderland False
Sunderland West-Bromwich-Albion Sunderland False
>>> _.swaplevel()
HOME_TEAM AWAY_TEAM BOOL
teams
Manchester-United 1153 Manchester-United Swansea-City True
Swansea-City 1153 Manchester-United Swansea-City True
Leicester-City 1163 Leicester-City Everton False
Everton 1163 Leicester-City Everton False
1172 Everton Hull-City True
Hull-City 1172 Everton Hull-City True
Stoke-City 1183 Stoke-City Everton True
Everton 1183 Stoke-City Everton True
West-Bromwich-Albion 1193 West-Bromwich-Albion Sunderland False
Sunderland 1193 West-Bromwich-Albion Sunderland False
>>> _.sort_index()
HOME_TEAM AWAY_TEAM BOOL
teams
Everton 1163 Leicester-City Everton False
1172 Everton Hull-City True
1183 Stoke-City Everton True
Hull-City 1172 Everton Hull-City True
Leicester-City 1163 Leicester-City Everton False
Manchester-United 1153 Manchester-United Swansea-City True
Stoke-City 1183 Stoke-City Everton True
Sunderland 1193 West-Bromwich-Albion Sunderland False
Swansea-City 1153 Manchester-United Swansea-City True
West-Bromwich-Albion 1193 West-Bromwich-Albion Sunderland False
ALTERNATIVELY或者
>>> df.append([df]).set_index(
df.filter(like='TEAM').melt().value,
drop=False, append=True).swaplevel().sort_index()
HOME_TEAM AWAY_TEAM BOOL
value
Everton 1163 Leicester-City Everton False
1172 Everton Hull-City True
1183 Stoke-City Everton True
Hull-City 1172 Everton Hull-City True
Leicester-City 1163 Leicester-City Everton False
Manchester-United 1153 Manchester-United Swansea-City True
Stoke-City 1183 Stoke-City Everton True
Sunderland 1193 West-Bromwich-Albion Sunderland False
Swansea-City 1153 Manchester-United Swansea-City True
West-Bromwich-Albion 1193 West-Bromwich-Albion Sunderland False
>>> df.append([df]).set_index(df.filter(like='TEAM').melt().value,
drop=False, append=True).swaplevel().groupby(level=0).get_group('Everton')
HOME_TEAM AWAY_TEAM BOOL
value
Everton 1163 Leicester-City Everton False
1172 Everton Hull-City True
1183 Stoke-City Everton True
You can do, for example:例如,您可以这样做:
pd.concat(
pd.DataFrame(df[['HOME_TEAM','AWAY_TEAM']].\
values.\
ravel()
).drop_duplicates()[0].\
apply(lambda team:
df[df['HOME_TEAM'].str.contains(team) | df['AWAY_TEAM'].str.contains(team)]
).to_list())
If I understand the question correctly, I am assuming that the user may be prompted to choose a team (I inferred this because you've said something about making loops for each team).如果我正确理解了这个问题,我假设可能会提示用户选择一个团队(我推断这是因为您已经说过为每个团队制作循环)。 Anyway, if that is the case, you may try using this:无论如何,如果是这种情况,您可以尝试使用这个:
select_Team = input("Which team do you like to see: ")
df2 = df[(df["HOME_TEAM"] == select_Team) | (df["AWAY_TEAM"] == select_Team)]
print(df2)
Overview: combine two columns unique values into one unique list then find where the resulting items are in either column A or column B then set the team value.概述:将两列唯一值合并到一个唯一列表中,然后找到结果项在 A 列或 B 列中的位置,然后设置团队值。
Index=[1153,1163,1172,1183,1193]
HOME_TEAM=['Manchester United','Leicester City','Everton','Stoke City','West Bromwich Albion']
AWAY_TEAM=['Swansea City','Everton','Hull City','Everton','Sunderland']
BOOL=[True,False,True,True,False]
types = pd.Series(data=['int', 'str', 'str', 'bool'], index=['Index', 'HOME_TEAM', 'AWAY_TEAM', 'BOOL'])
types = types.apply(eval)
df=pd.DataFrame({'Index':Index, 'HOME_TEAM':HOME_TEAM, 'AWAY_TEAM':AWAY_TEAM, 'BOOL':BOOL})
df.set_index('Index')
df = df.astype(dtype=types.to_dict())
print(df.dtypes)
list1=list(df['HOME_TEAM'].unique())
list2=list(df['AWAY_TEAM'].unique())
notInList1=set(list1)-set(list2)
combined=(list1+list(notInList1))
df['TEAM']=""
for item in combined:
filter=(df['HOME_TEAM']==item) | (df['AWAY_TEAM']==item)
key=df[filter]['Index'].index
df.loc[key,'TEAM']=item
print(df.head())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.