如何在熊猫列中的特殊字符之前删除字符串的一部分？

Question

I have this simple dataframe: 我有这个简单的数据框：

In [101]: df = pd.DataFrame({'a':[1,2,3],'b':['ciao','hotel',"l'hotel"]})

In [102]: df
Out[102]: 
   a           b
0  1        ciao
1  2       hotel
2  3     l'hotel

The goal here is to remove the part of the strings ahead the ' apostrophe, so that df: 这里的目标是删除'撇号前面的字符串部分，以便df：

   a           b
0  1        ciao
1  2       hotel
2  3       hotel

So far I tried to split the string with sep=("'") and get the second element only, but it failed since I have strings (and therefore lists) with different length: 到目前为止，我尝试用sep=("'")拆分字符串并仅获取第二个元素，但是由于我有不同长度的字符串（因此列出了），所以它失败了：

df['c'] = df['b'].apply(lambda x: x.split("'")[1])

Answer 1

You can use -1 to always get the last part rather than the second part. 您可以使用-1始终获得最后一部分而不是第二部分。

df['c'] = df['b'].apply(lambda x: x.split("'")[-1])

print(df)

#    a        b      c
# 0  1     ciao   ciao
# 1  2    hotel  hotel
# 2  3  l'hotel  hotel

However, keep in mind that this will brake if you have have strings with 2 or more apostrophes (but your requirement doesn't specify what to do in these cases anyway). 但是，请记住，如果您的字符串带有两个或两个以上的撇号（这将使您刹车）（但是您的要求始终未指定在这种情况下的处理方式）。

Answer 2

Use str.split and select last list by -1 : 使用str.split并按-1选择最后一个列表：

df['c'] = df['b'].str.split("'").str[-1]
print (df)
   a        b      c
0  1     ciao   ciao
1  2    hotel  hotel
2  3  l'hotel  hotel

Or use str.replace : 或使用str.replace ：

df['c'] = df['b'].str.replace("(.*)'", '')
print (df)
   a        b      c
0  1     ciao   ciao
1  2    hotel  hotel
2  3  l'hotel  hotel

如何在熊猫列中的特殊字符之前删除字符串的一部分？

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-08-28 12:49:33

解决方案2
2 2017-08-28 12:50:00

如何在熊猫列中的特殊字符之前删除字符串的一部分？

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-08-28 12:49:33

解决方案2 2 2017-08-28 12:50:00

解决方案1
2 已采纳 2017-08-28 12:49:33

解决方案2
2 2017-08-28 12:50:00