繁体   English   中英

在 pandas Dataframe 中处理字符串列的最有效方法

[英]Most efficient way to work with a string column in a pandas Dataframe

我有一个带有足球结果的 DataFrame:

   home_team             away team         home_team_goal_timings   away_team_goal_timings
0  Tottenham Hotspur     Manchester City   24,56                    77,88
1  Sunderland            Birmingham City   15,40,66                 16,38,43,75
2  Aston Villa           West Ham United   14                       6,44,55,63,68,90
3  Chelsea               Everton           37,39                    12,32,39,49,58,83  
4  Arsenal               Stoke City        6,44,55,63,68,90         57,71

对于 DataFrame 创建:

data = {'home_team': ['Tottenham Hotspur', 'Sunderland', 'Aston Villa', 'Chelsea', 'Arsenal'],
   'away_team':['Manchester City', 'Birmingham City', 'West Ham United', 'Everton', 'Stoke City'],
   'home_team_goal_timings':['24,56', '15,40,66', '14', '37,39', '6,44,55,63,68,90'],
   'away_team_goal_timings': ['77,88', '16,38,43,75', '6,44,55,63,68,90', '12,32,39,49,58,83', 
    '57,71']}

test = pd.DataFrame(data)

我想从原来的 DataFrame 中切出所有主队在第 20 分钟之前得分的比赛,是否可以在当前格式上切列?

您可以使用.loc.apply来做到这一点。 lambda 将字符串拆分为','并获取第一个元素。 如果低于 20 则返回True ,否则返回False

print(test.loc[test.home_team_goal_timings.apply(lambda x: int(x.split(',')[0]) < 20 if x else False)])


     home_team        away_team home_team_goal_timings away_team_goal_timings
1   Sunderland  Birmingham City               15,40,66            16,38,43,75
2  Aston Villa  West Ham United                     14       6,44,55,63,68,90
4      Arsenal       Stoke City       6,44,55,63,68,90                  57,71

注意:这确实假设home_team_goal_timings是按升序排列的。 lambda 中的if x检查是针对没有目标的情况。

我们可以使用Series.str.split拆分逗号并使用Series.str[0]获取第一个元素,然后检查 integer 是否< 20

m = test['home_team_goal_timings'].str.split(',').str[0].astype(int) < 20
test[m]

     home_team        away_team home_team_goal_timings away_team_goal_timings
1   Sunderland  Birmingham City               15,40,66            16,38,43,75
2  Aston Villa  West Ham United                     14       6,44,55,63,68,90
4      Arsenal       Stoke City       6,44,55,63,68,90                  57,71

这里还有一个变化:

test.loc[np.vectorize(lambda r: int(r.split(',')[0]) < 20)(df.home_team_goal_timings.values)]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM