简体   繁体   English

pandas 删除特定单词之前的所有单词并获取该特定单词之后的前 n 个单词

[英]pandas remove all words before a specific word and get the first n words after that specific word

I have a dataframe like this:我有一个像这样的 dataframe:

df=pd.DataFrame({'caption':'hello this pack is for you: Jake Peralta. Thanks'})
df

caption
hello this pack is for you: Jake Peralta. Thanks
...
...
...

I'm trying to get the recipient's first and last name here.我正在尝试在这里获取收件人的名字和姓氏。 The format of the caption column is always the same.标题栏的格式始终相同。 So delete everything before for you: and get the first 2(this number may change) words after for you:因此,为您删除之前的所有内容:并为您获取后面的前 2 个(此数字可能会更改)单词

Takes care of leading spaces in name:处理名称中的前导空格:

>>> df.caption.str.split(".").str[0].str.split(":").str[1].str.strip()

1    Jake Peralta
Name: caption, dtype: object

here is one way:这是一种方法:

df.caption.apply(lambda st: st[st.find(":")+2:st.find(".")])

output: output:

0     Jake Peralta
Name: caption, dtype: object

May be you can try like this也许你可以这样尝试

df['caption'].str.split("for you: ").str[1].str.split('.').str[0]

output: output:

0    Jake Peralta
1      first last

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM