如何根据条件在 Pandas 数据帧上应用字符串拆分方法？

Question

I would like to replace some values in my dataframe that were entered in the wrong format.我想替换我的数据框中以错误格式输入的一些值。 For example, 850/07-498745 should be 07-498745.例如，850/07-498745 应为 07-498745。 Now, I used string split successfully to do so.现在，我成功地使用了字符串拆分来做到这一点。 However, it turns all previously correctly formatted strings into NaNs.但是，它将所有以前正确格式化的字符串转换为 NaN。 I tried to base it on a condition, but still I have the same problem.我试图根据一个条件，但我仍然有同样的问题。 How can I fix it ?我该如何解决？

Example Input:示例输入：

mylist = ['850/07-498745', '850/07-148465', '07-499015']
df = pd.DataFrame(mylist)
df.rename(columns={ df.columns[0]: "mycolumn" }, inplace = True)

My Attempt:我的尝试：

df['mycolumn'] = df[df.mycolumn.str.contains('/') == True].mycolumn.str.split('/', 1).str[1]
df

Output:输出：

What I wanted:我想要的：

Answer 1

You can use split with / and grab the last returning string from the list:您可以使用split和/并从列表中获取最后一个返回的字符串：

df['mycolumn'].str.split('/').str[-1]

0    07-498745
1    07-148465
2    07-499015
Name: mycolumn, dtype: object

Answer 2

This would also work, and may help you understand why your original attempt did not:这也可以，并且可以帮助您理解为什么您最初的尝试没有：

mask = df.mycolumn.str.contains('/')
df.mycolumn.loc[mask] = df.mycolumn[mask].str.split('/', 1).str[1]

You were doing df['mycolumn'] = ... , which I believe is just replacing the entire Series for that column with the new one you formed.您正在执行df['mycolumn'] = ... ，我相信这只是用您形成的新系列替换该列的整个系列。

Answer 3

For a regex solution:对于正则表达式解决方案：

df.mycolumn.str.extract('(?:.*/)?(.*)$')[0]

Output:输出：

0    07-498745
1    07-148465
2    07-499015
Name: 0, dtype: object

如何根据条件在 Pandas 数据帧上应用字符串拆分方法？

问题描述

3 个解决方案

解决方案1
3 2020-01-14 16:37:08

解决方案2
2 已采纳 2020-01-14 17:11:41

解决方案3
1 2020-01-14 16:41:13

如何根据条件在 Pandas 数据帧上应用字符串拆分方法？

问题描述

3 个解决方案

解决方案1 3 2020-01-14 16:37:08

解决方案2 2 已采纳 2020-01-14 17:11:41

解决方案3 1 2020-01-14 16:41:13

解决方案1
3 2020-01-14 16:37:08

解决方案2
2 已采纳 2020-01-14 17:11:41

解决方案3
1 2020-01-14 16:41:13