I have a DataFrame with soccer results in it:
home_team away team home_team_goal_timings away_team_goal_timings
0 Tottenham Hotspur Manchester City 24,56 77,88
1 Sunderland Birmingham City 15,40,66 16,38,43,75
2 Aston Villa West Ham United 14 6,44,55,63,68,90
3 Chelsea Everton 37,39 12,32,39,49,58,83
4 Arsenal Stoke City 6,44,55,63,68,90 57,71
For DataFrame Creation:
data = {'home_team': ['Tottenham Hotspur', 'Sunderland', 'Aston Villa', 'Chelsea', 'Arsenal'],
'away_team':['Manchester City', 'Birmingham City', 'West Ham United', 'Everton', 'Stoke City'],
'home_team_goal_timings':['24,56', '15,40,66', '14', '37,39', '6,44,55,63,68,90'],
'away_team_goal_timings': ['77,88', '16,38,43,75', '6,44,55,63,68,90', '12,32,39,49,58,83',
'57,71']}
test = pd.DataFrame(data)
I would like to slice from the original DataFrame all games in which the home team scored before the 20th minute, is it possible to slice the column on the current format?
You could do so using .loc
and .apply
. The lambda splits the string on ','
and takes the first element. If that is lower than 20 it returns True
, else False
.
print(test.loc[test.home_team_goal_timings.apply(lambda x: int(x.split(',')[0]) < 20 if x else False)])
home_team away_team home_team_goal_timings away_team_goal_timings
1 Sunderland Birmingham City 15,40,66 16,38,43,75
2 Aston Villa West Ham United 14 6,44,55,63,68,90
4 Arsenal Stoke City 6,44,55,63,68,90 57,71
Note: this does assume the home_team_goal_timings
are in ascending order. The if x
check in the lambda is for the case of no goals.
We can use Series.str.split
to split on the commas and grab the first element with Series.str[0]
, then we check if the integer is < 20
:
m = test['home_team_goal_timings'].str.split(',').str[0].astype(int) < 20
test[m]
home_team away_team home_team_goal_timings away_team_goal_timings
1 Sunderland Birmingham City 15,40,66 16,38,43,75
2 Aston Villa West Ham United 14 6,44,55,63,68,90
4 Arsenal Stoke City 6,44,55,63,68,90 57,71
Here one more variation:
test.loc[np.vectorize(lambda r: int(r.split(',')[0]) < 20)(df.home_team_goal_timings.values)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.