I have this simple dataframe:
In [101]: df = pd.DataFrame({'a':[1,2,3],'b':['ciao','hotel',"l'hotel"]})
In [102]: df
Out[102]:
a b
0 1 ciao
1 2 hotel
2 3 l'hotel
The goal here is to remove the part of the strings ahead the '
apostrophe, so that df:
a b
0 1 ciao
1 2 hotel
2 3 hotel
So far I tried to split the string with sep=("'")
and get the second element only, but it failed since I have strings (and therefore lists) with different length:
df['c'] = df['b'].apply(lambda x: x.split("'")[1])
You can use -1
to always get the last part rather than the second part.
df['c'] = df['b'].apply(lambda x: x.split("'")[-1])
print(df)
# a b c
# 0 1 ciao ciao
# 1 2 hotel hotel
# 2 3 l'hotel hotel
However, keep in mind that this will brake if you have have strings with 2 or more apostrophes (but your requirement doesn't specify what to do in these cases anyway).
Use str.split
and select last list by -1
:
df['c'] = df['b'].str.split("'").str[-1]
print (df)
a b c
0 1 ciao ciao
1 2 hotel hotel
2 3 l'hotel hotel
Or use str.replace
:
df['c'] = df['b'].str.replace("(.*)'", '')
print (df)
a b c
0 1 ciao ciao
1 2 hotel hotel
2 3 l'hotel hotel
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.