简体   繁体   中英

Remove part of a sentence after specific character in Pandas

I have a column in pandas full of sentences. In each of those sentences, I'm trying to remove part of a sentence after a word "in". Example:

Current form: "Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend in Maryland"

Desired form: "Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend"

I have tried multiple solutions, however in each case the sentence gets separated after any instance of a string "in", even when it's inside a word. So currently, my output is this: "Mary has a lot of furniture". That's because the word inside contains the string "in"

This is what I currently have and its not producing a desired output:

 df['split'] = df.sentences.apply(lambda x: "in".join(x.split("in", 1)[:1]))

Any help would be greatly appreciated!

Use str.split and split on the word in if it has a whitespace before and after it.

df['split'] = df['sentences'].str.split('\sin\s').str[0]

Output

0    Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend
Name: sentences, dtype: object

Or using word boundaries as Zachary suggests in the comments:

df['split'] = df['sentences'].str.split(r'\bin\b').str[0]

You are almost there, you just need to add an extra space before and after the word in like this ' in ' :

df['split'] = df.sentences.apply(lambda x: " in ".join(x.split(" in ", 1)[:1]))

Output:

Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM