Remove part of a sentence after specific character in Pandas

Question

I have a column in pandas full of sentences. In each of those sentences, I'm trying to remove part of a sentence after a word "in". Example:

Current form: "Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend in Maryland"

Desired form: "Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend"

I have tried multiple solutions, however in each case the sentence gets separated after any instance of a string "in", even when it's inside a word. So currently, my output is this: "Mary has a lot of furniture". That's because the word inside contains the string "in"

This is what I currently have and its not producing a desired output:

 df['split'] = df.sentences.apply(lambda x: "in".join(x.split("in", 1)[:1]))

Any help would be greatly appreciated!

Answer 1

Use str.split and split on the word in if it has a whitespace before and after it.

df['split'] = df['sentences'].str.split('\sin\s').str[0]

Output

0    Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend
Name: sentences, dtype: object

Or using word boundaries as Zachary suggests in the comments:

df['split'] = df['sentences'].str.split(r'\bin\b').str[0]

Answer 2

You are almost there, you just need to add an extra space before and after the word in like this ' in ' :

df['split'] = df.sentences.apply(lambda x: " in ".join(x.split(" in ", 1)[:1]))

Output:

Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend

Remove part of a sentence after specific character in Pandas

Question

2 answers

solution1
2 2019-07-28 21:15:33

solution2
2 2019-07-28 21:20:42

Remove part of a sentence after specific character in Pandas

Question

2 answers

solution1 2 2019-07-28 21:15:33

solution2 2 2019-07-28 21:20:42

solution1
2 2019-07-28 21:15:33

solution2
2 2019-07-28 21:20:42