[英]How to remove part of string ahead of special character in a column in Pandas?
I have this simple dataframe: 我有这个简单的数据框:
In [101]: df = pd.DataFrame({'a':[1,2,3],'b':['ciao','hotel',"l'hotel"]})
In [102]: df
Out[102]:
a b
0 1 ciao
1 2 hotel
2 3 l'hotel
The goal here is to remove the part of the strings ahead the '
apostrophe, so that df: 这里的目标是删除
'
撇号前面的字符串部分,以便df:
a b
0 1 ciao
1 2 hotel
2 3 hotel
So far I tried to split the string with sep=("'")
and get the second element only, but it failed since I have strings (and therefore lists) with different length: 到目前为止,我尝试用
sep=("'")
拆分字符串并仅获取第二个元素,但是由于我有不同长度的字符串(因此列出了),所以它失败了:
df['c'] = df['b'].apply(lambda x: x.split("'")[1])
You can use -1
to always get the last part rather than the second part. 您可以使用
-1
始终获得最后一部分而不是第二部分。
df['c'] = df['b'].apply(lambda x: x.split("'")[-1])
print(df)
# a b c
# 0 1 ciao ciao
# 1 2 hotel hotel
# 2 3 l'hotel hotel
However, keep in mind that this will brake if you have have strings with 2 or more apostrophes (but your requirement doesn't specify what to do in these cases anyway). 但是,请记住,如果您的字符串带有两个或两个以上的撇号(这将使您刹车)(但是您的要求始终未指定在这种情况下的处理方式)。
Use str.split
and select last list by -1
: 使用
str.split
并按-1
选择最后一个列表:
df['c'] = df['b'].str.split("'").str[-1]
print (df)
a b c
0 1 ciao ciao
1 2 hotel hotel
2 3 l'hotel hotel
Or use str.replace
: 或使用
str.replace
:
df['c'] = df['b'].str.replace("(.*)'", '')
print (df)
a b c
0 1 ciao ciao
1 2 hotel hotel
2 3 l'hotel hotel
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.