如何在已经被正则表达式过滤的pandas DataFrame上使用.apply函数？

Question

I have a pandas DataFrame with data scraped from a couple Wiki tables. 我有一个pandas DataFrame，其数据来自几个Wiki表。 The DataFrame has a column for names and some of these names are followed by "\\r\\n(head coach)". DataFrame有一个名称列，其中一些名称后跟“\\ r \\ n（主教练）”。 I would like to remove that and so I tried this: 我想删除它，所以我尝试了这个：

df['name'][df.name.str.contains(r'coach')] =\
df['name'][df.name.str.contains(r'coach')].apply(lambda x: x[0:-14])

When this runs, I get a SettingWithCopyWarning. 当它运行时，我得到一个SettingWithCopyWarning。 I tried using .loc as suggested in this SO Q&A : 我尝试使用.loc，如本问答所示：

 mask = df.loc[:,'name'] == df['name'].str.contains(r'coach')

But every value returns as False and so I get an empty Series when I use this with my DataFrame. 但是每个值都返回False，因此当我在DataFrame中使用它时，我得到一个空系列。

I'm not sure where I am going wrong with this. 我不确定我的错在哪里。 Any pointers? 有什么指针吗？

Answer 1

You can try this: 你可以试试这个：

mask = df.name.str.contains(r'coach')]
df.loc[mask, 'name'] = df.loc[mask, 'name'].str[:-14]

Or as @piRSquared commented, this simple line should also work: 或者@piRSquared评论说，这个简单的行也应该有效：

df.loc[mask, 'name'] = df.name.str[:-14]

如何在已经被正则表达式过滤的pandas DataFrame上使用.apply函数？

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-03-27 15:51:06

如何在已经被正则表达式过滤的pandas DataFrame上使用.apply函数？

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-03-27 15:51:06

解决方案1
3 已采纳 2017-03-27 15:51:06