简体   繁体   中英

pandas remove all words before a specific word and get the first n words after that specific word

I have a dataframe like this:

df=pd.DataFrame({'caption':'hello this pack is for you: Jake Peralta. Thanks'})
df

caption
hello this pack is for you: Jake Peralta. Thanks
...
...
...

I'm trying to get the recipient's first and last name here. The format of the caption column is always the same. So delete everything before for you: and get the first 2(this number may change) words after for you:

Takes care of leading spaces in name:

>>> df.caption.str.split(".").str[0].str.split(":").str[1].str.strip()

1    Jake Peralta
Name: caption, dtype: object

here is one way:

df.caption.apply(lambda st: st[st.find(":")+2:st.find(".")])

output:

0     Jake Peralta
Name: caption, dtype: object

May be you can try like this

df['caption'].str.split("for you: ").str[1].str.split('.').str[0]

output:

0    Jake Peralta
1      first last

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM