在 pandas Dataframe 中处理字符串列的最有效方法

Question

I have a DataFrame with soccer results in it:我有一个带有足球结果的 DataFrame：

   home_team             away team         home_team_goal_timings   away_team_goal_timings
0  Tottenham Hotspur     Manchester City   24,56                    77,88
1  Sunderland            Birmingham City   15,40,66                 16,38,43,75
2  Aston Villa           West Ham United   14                       6,44,55,63,68,90
3  Chelsea               Everton           37,39                    12,32,39,49,58,83  
4  Arsenal               Stoke City        6,44,55,63,68,90         57,71

For DataFrame Creation:对于 DataFrame 创建：

data = {'home_team': ['Tottenham Hotspur', 'Sunderland', 'Aston Villa', 'Chelsea', 'Arsenal'],
   'away_team':['Manchester City', 'Birmingham City', 'West Ham United', 'Everton', 'Stoke City'],
   'home_team_goal_timings':['24,56', '15,40,66', '14', '37,39', '6,44,55,63,68,90'],
   'away_team_goal_timings': ['77,88', '16,38,43,75', '6,44,55,63,68,90', '12,32,39,49,58,83', 
    '57,71']}

test = pd.DataFrame(data)

I would like to slice from the original DataFrame all games in which the home team scored before the 20th minute, is it possible to slice the column on the current format?我想从原来的 DataFrame 中切出所有主队在第 20 分钟之前得分的比赛，是否可以在当前格式上切列？

Answer 1

You could do so using .loc and .apply .您可以使用.loc和.apply来做到这一点。 The lambda splits the string on ',' and takes the first element. lambda 将字符串拆分为','并获取第一个元素。 If that is lower than 20 it returns True , else False .如果低于 20 则返回True ，否则返回False 。

print(test.loc[test.home_team_goal_timings.apply(lambda x: int(x.split(',')[0]) < 20 if x else False)])


     home_team        away_team home_team_goal_timings away_team_goal_timings
1   Sunderland  Birmingham City               15,40,66            16,38,43,75
2  Aston Villa  West Ham United                     14       6,44,55,63,68,90
4      Arsenal       Stoke City       6,44,55,63,68,90                  57,71

Note: this does assume the home_team_goal_timings are in ascending order.注意：这确实假设home_team_goal_timings是按升序排列的。 The if x check in the lambda is for the case of no goals. lambda 中的if x检查是针对没有目标的情况。

Answer 2

We can use Series.str.split to split on the commas and grab the first element with Series.str[0] , then we check if the integer is < 20 :我们可以使用Series.str.split拆分逗号并使用Series.str[0]获取第一个元素，然后检查 integer 是否< 20 ：

m = test['home_team_goal_timings'].str.split(',').str[0].astype(int) < 20
test[m]

     home_team        away_team home_team_goal_timings away_team_goal_timings
1   Sunderland  Birmingham City               15,40,66            16,38,43,75
2  Aston Villa  West Ham United                     14       6,44,55,63,68,90
4      Arsenal       Stoke City       6,44,55,63,68,90                  57,71

Answer 3

Here one more variation:这里还有一个变化：

test.loc[np.vectorize(lambda r: int(r.split(',')[0]) < 20)(df.home_team_goal_timings.values)]

在 pandas Dataframe 中处理字符串列的最有效方法

问题描述

3 个解决方案

解决方案1
0 2020-07-05 11:34:17

解决方案2
0 2020-07-05 11:45:08

解决方案3
0 2020-07-05 11:59:11

在 pandas Dataframe 中处理字符串列的最有效方法

问题描述

3 个解决方案

解决方案1 0 2020-07-05 11:34:17

解决方案2 0 2020-07-05 11:45:08

解决方案3 0 2020-07-05 11:59:11

解决方案1
0 2020-07-05 11:34:17

解决方案2
0 2020-07-05 11:45:08

解决方案3
0 2020-07-05 11:59:11