[英]Pandas groupby by the same value in different columns
我有一个 dataframe 是这样的:
HOME_TEAM AWAY_TEAM BOOL
1153 Manchester United Swansea City True
1163 Leicester City Everton False
1172 Everton Hull City True
1183 Stoke City Everton True
1193 West Bromwich Albion Sunderland False
我想要出现在 HOME_TEAM或AWAY_TEAM 中的每个团队的组。 例如,对于埃弗顿,我想要类似的结果:
HOME_TEAM AWAY_TEAM BOOL
1163 Leicester City Everton False
1172 Everton Hull City True
1183 Stoke City Everton True
然后我要数连续的True of False,但这不是问题。 问题是以这种方式对匹配进行分组。
我知道我可以简单地使用
(df.HOME_TEAM == 'Everton') | (df.AWAY_TEAM == 'Everton)
但是这样我应该为我的 dataframe 中的每个团队使用一个 for 循环,这对于我的大 dataframe 来说太慢了。
您可以使用以下内容:
>>> (df.filter(like='TEAM').stack()
.reset_index(level=1, drop=True).to_frame('teams')
.join(df).set_index('teams', append=True
).swaplevel().sort_index()
HOME_TEAM AWAY_TEAM BOOL
teams
Everton 1163 Leicester-City Everton False
1172 Everton Hull-City True
1183 Stoke-City Everton True
Hull-City 1172 Everton Hull-City True
Leicester-City 1163 Leicester-City Everton False
Manchester-United 1153 Manchester-United Swansea-City True
Stoke-City 1183 Stoke-City Everton True
Sunderland 1193 West-Bromwich-Albion Sunderland False
Swansea-City 1153 Manchester-United Swansea-City True
West-Bromwich-Albion 1193 West-Bromwich-Albion Sunderland False
或者对于groupby
:
>>> group = (df.filter(like='TEAM').stack()
.reset_index(level=1, drop=True).to_frame('teams')
.join(df).groupby('teams'))
>>> group.get_group('Everton')
teams HOME_TEAM AWAY_TEAM BOOL
1163 Everton Leicester-City Everton False
1172 Everton Everton Hull-City True
1183 Everton Stoke-City Everton True
这个怎么运作
>>> df.filter(like='TEAM')
HOME_TEAM AWAY_TEAM
1153 Manchester-United Swansea-City
1163 Leicester-City Everton
1172 Everton Hull-City
1183 Stoke-City Everton
1193 West-Bromwich-Albion Sunderland
>>> _.stack()
1153 HOME_TEAM Manchester-United
AWAY_TEAM Swansea-City
1163 HOME_TEAM Leicester-City
AWAY_TEAM Everton
1172 HOME_TEAM Everton
AWAY_TEAM Hull-City
1183 HOME_TEAM Stoke-City
AWAY_TEAM Everton
1193 HOME_TEAM West-Bromwich-Albion
AWAY_TEAM Sunderland
>>> _.reset_index(level=1, drop=True)
1153 Manchester-United
1153 Swansea-City
1163 Leicester-City
1163 Everton
1172 Everton
1172 Hull-City
1183 Stoke-City
1183 Everton
1193 West-Bromwich-Albion
1193 Sunderland
>>> _.to_frame('teams')
teams
1153 Manchester-United
1153 Swansea-City
1163 Leicester-City
1163 Everton
1172 Everton
1172 Hull-City
1183 Stoke-City
1183 Everton
1193 West-Bromwich-Albion
1193 Sunderland
>>> _.join(df)
teams HOME_TEAM AWAY_TEAM BOOL
1153 Manchester-United Manchester-United Swansea-City True
1153 Swansea-City Manchester-United Swansea-City True
1163 Leicester-City Leicester-City Everton False
1163 Everton Leicester-City Everton False
1172 Everton Everton Hull-City True
1172 Hull-City Everton Hull-City True
1183 Stoke-City Stoke-City Everton True
1183 Everton Stoke-City Everton True
1193 West-Bromwich-Albion West-Bromwich-Albion Sunderland False
1193 Sunderland West-Bromwich-Albion Sunderland False
>>> _.set_index('teams', append=True)
HOME_TEAM AWAY_TEAM BOOL
teams
1153 Manchester-United Manchester-United Swansea-City True
Swansea-City Manchester-United Swansea-City True
1163 Leicester-City Leicester-City Everton False
Everton Leicester-City Everton False
1172 Everton Everton Hull-City True
Hull-City Everton Hull-City True
1183 Stoke-City Stoke-City Everton True
Everton Stoke-City Everton True
1193 West-Bromwich-Albion West-Bromwich-Albion Sunderland False
Sunderland West-Bromwich-Albion Sunderland False
>>> _.swaplevel()
HOME_TEAM AWAY_TEAM BOOL
teams
Manchester-United 1153 Manchester-United Swansea-City True
Swansea-City 1153 Manchester-United Swansea-City True
Leicester-City 1163 Leicester-City Everton False
Everton 1163 Leicester-City Everton False
1172 Everton Hull-City True
Hull-City 1172 Everton Hull-City True
Stoke-City 1183 Stoke-City Everton True
Everton 1183 Stoke-City Everton True
West-Bromwich-Albion 1193 West-Bromwich-Albion Sunderland False
Sunderland 1193 West-Bromwich-Albion Sunderland False
>>> _.sort_index()
HOME_TEAM AWAY_TEAM BOOL
teams
Everton 1163 Leicester-City Everton False
1172 Everton Hull-City True
1183 Stoke-City Everton True
Hull-City 1172 Everton Hull-City True
Leicester-City 1163 Leicester-City Everton False
Manchester-United 1153 Manchester-United Swansea-City True
Stoke-City 1183 Stoke-City Everton True
Sunderland 1193 West-Bromwich-Albion Sunderland False
Swansea-City 1153 Manchester-United Swansea-City True
West-Bromwich-Albion 1193 West-Bromwich-Albion Sunderland False
或者
>>> df.append([df]).set_index(
df.filter(like='TEAM').melt().value,
drop=False, append=True).swaplevel().sort_index()
HOME_TEAM AWAY_TEAM BOOL
value
Everton 1163 Leicester-City Everton False
1172 Everton Hull-City True
1183 Stoke-City Everton True
Hull-City 1172 Everton Hull-City True
Leicester-City 1163 Leicester-City Everton False
Manchester-United 1153 Manchester-United Swansea-City True
Stoke-City 1183 Stoke-City Everton True
Sunderland 1193 West-Bromwich-Albion Sunderland False
Swansea-City 1153 Manchester-United Swansea-City True
West-Bromwich-Albion 1193 West-Bromwich-Albion Sunderland False
>>> df.append([df]).set_index(df.filter(like='TEAM').melt().value,
drop=False, append=True).swaplevel().groupby(level=0).get_group('Everton')
HOME_TEAM AWAY_TEAM BOOL
value
Everton 1163 Leicester-City Everton False
1172 Everton Hull-City True
1183 Stoke-City Everton True
例如,您可以这样做:
pd.concat(
pd.DataFrame(df[['HOME_TEAM','AWAY_TEAM']].\
values.\
ravel()
).drop_duplicates()[0].\
apply(lambda team:
df[df['HOME_TEAM'].str.contains(team) | df['AWAY_TEAM'].str.contains(team)]
).to_list())
如果我正确理解了这个问题,我假设可能会提示用户选择一个团队(我推断这是因为您已经说过为每个团队制作循环)。 无论如何,如果是这种情况,您可以尝试使用这个:
select_Team = input("Which team do you like to see: ")
df2 = df[(df["HOME_TEAM"] == select_Team) | (df["AWAY_TEAM"] == select_Team)]
print(df2)
概述:将两列唯一值合并到一个唯一列表中,然后找到结果项在 A 列或 B 列中的位置,然后设置团队值。
Index=[1153,1163,1172,1183,1193]
HOME_TEAM=['Manchester United','Leicester City','Everton','Stoke City','West Bromwich Albion']
AWAY_TEAM=['Swansea City','Everton','Hull City','Everton','Sunderland']
BOOL=[True,False,True,True,False]
types = pd.Series(data=['int', 'str', 'str', 'bool'], index=['Index', 'HOME_TEAM', 'AWAY_TEAM', 'BOOL'])
types = types.apply(eval)
df=pd.DataFrame({'Index':Index, 'HOME_TEAM':HOME_TEAM, 'AWAY_TEAM':AWAY_TEAM, 'BOOL':BOOL})
df.set_index('Index')
df = df.astype(dtype=types.to_dict())
print(df.dtypes)
list1=list(df['HOME_TEAM'].unique())
list2=list(df['AWAY_TEAM'].unique())
notInList1=set(list1)-set(list2)
combined=(list1+list(notInList1))
df['TEAM']=""
for item in combined:
filter=(df['HOME_TEAM']==item) | (df['AWAY_TEAM']==item)
key=df[filter]['Index'].index
df.loc[key,'TEAM']=item
print(df.head())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.