简体   繁体   English

在 pandas Dataframe 中处理字符串列的最有效方法

[英]Most efficient way to work with a string column in a pandas Dataframe

I have a DataFrame with soccer results in it:我有一个带有足球结果的 DataFrame:

   home_team             away team         home_team_goal_timings   away_team_goal_timings
0  Tottenham Hotspur     Manchester City   24,56                    77,88
1  Sunderland            Birmingham City   15,40,66                 16,38,43,75
2  Aston Villa           West Ham United   14                       6,44,55,63,68,90
3  Chelsea               Everton           37,39                    12,32,39,49,58,83  
4  Arsenal               Stoke City        6,44,55,63,68,90         57,71

For DataFrame Creation:对于 DataFrame 创建:

data = {'home_team': ['Tottenham Hotspur', 'Sunderland', 'Aston Villa', 'Chelsea', 'Arsenal'],
   'away_team':['Manchester City', 'Birmingham City', 'West Ham United', 'Everton', 'Stoke City'],
   'home_team_goal_timings':['24,56', '15,40,66', '14', '37,39', '6,44,55,63,68,90'],
   'away_team_goal_timings': ['77,88', '16,38,43,75', '6,44,55,63,68,90', '12,32,39,49,58,83', 
    '57,71']}

test = pd.DataFrame(data)

I would like to slice from the original DataFrame all games in which the home team scored before the 20th minute, is it possible to slice the column on the current format?我想从原来的 DataFrame 中切出所有主队在第 20 分钟之前得分的比赛,是否可以在当前格式上切列?

You could do so using .loc and .apply .您可以使用.loc.apply来做到这一点。 The lambda splits the string on ',' and takes the first element. lambda 将字符串拆分为','并获取第一个元素。 If that is lower than 20 it returns True , else False .如果低于 20 则返回True ,否则返回False

print(test.loc[test.home_team_goal_timings.apply(lambda x: int(x.split(',')[0]) < 20 if x else False)])


     home_team        away_team home_team_goal_timings away_team_goal_timings
1   Sunderland  Birmingham City               15,40,66            16,38,43,75
2  Aston Villa  West Ham United                     14       6,44,55,63,68,90
4      Arsenal       Stoke City       6,44,55,63,68,90                  57,71

Note: this does assume the home_team_goal_timings are in ascending order.注意:这确实假设home_team_goal_timings是按升序排列的。 The if x check in the lambda is for the case of no goals. lambda 中的if x检查是针对没有目标的情况。

We can use Series.str.split to split on the commas and grab the first element with Series.str[0] , then we check if the integer is < 20 :我们可以使用Series.str.split拆分逗号并使用Series.str[0]获取第一个元素,然后检查 integer 是否< 20

m = test['home_team_goal_timings'].str.split(',').str[0].astype(int) < 20
test[m]

     home_team        away_team home_team_goal_timings away_team_goal_timings
1   Sunderland  Birmingham City               15,40,66            16,38,43,75
2  Aston Villa  West Ham United                     14       6,44,55,63,68,90
4      Arsenal       Stoke City       6,44,55,63,68,90                  57,71

Here one more variation:这里还有一个变化:

test.loc[np.vectorize(lambda r: int(r.split(',')[0]) < 20)(df.home_team_goal_timings.values)]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas DataFrame 中映射列的最有效方法 - Most efficient way of mapping column in pandas DataFrame 在 Pandas DataFrame 中转换列值的最有效方法 - Most efficient way to convert values of column in Pandas DataFrame 将pandas dataframe列拆分为多个列的最有效方法 - Most efficient way to split a pandas dataframe column into several columns 返回第一个包含Pandas DataFrame中字符串的单元格-最有效的方法吗? - Return first cell containing string in Pandas DataFrame - most efficient way? 将大熊猫数据帧的每一列与同一数据帧的每一列相乘的最有效方法 - Most efficient way to multiply every column of a large pandas dataframe with every other column of the same dataframe 在 pandas 中计算平方 dataframe 的最有效方法 - Most efficient way to compute a square dataframe in pandas 如何通过python / pandas中另一个数据框的值来标记一个数据框的列的最有效方式? - How to flag the most efficient way a column of a dataframe by values of another dataframe's in python/pandas? Dataframes 填充 dataframe 列的最有效方法 - Dataframes the most efficient way to fill the column of dataframe 在 Python/Pandas 中,将自定义 function 应用于输入包含字符串的 dataframe 的列的最有效方法是什么? - In Python/Pandas, what is the most efficient way, to apply a custom function, to a column of a dataframe, where the input includes strings? 从Pandas DataFrame中选择有限值的最新索引的有效方法? - Efficient way to select most recent index with finite value in column from Pandas DataFrame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM