I have a column in pandas full of sentences. In each of those sentences, I'm trying to remove part of a sentence after a word "in". Example:
Current form: "Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend in Maryland"
Desired form: "Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend"
I have tried multiple solutions, however in each case the sentence gets separated after any instance of a string "in", even when it's inside a word. So currently, my output is this: "Mary has a lot of furniture". That's because the word inside contains the string "in"
This is what I currently have and its not producing a desired output:
df['split'] = df.sentences.apply(lambda x: "in".join(x.split("in", 1)[:1]))
Any help would be greatly appreciated!
Use str.split
and split on the word in
if it has a whitespace before and after it.
df['split'] = df['sentences'].str.split('\sin\s').str[0]
Output
0 Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend
Name: sentences, dtype: object
Or using word boundaries as Zachary suggests in the comments:
df['split'] = df['sentences'].str.split(r'\bin\b').str[0]
You are almost there, you just need to add an extra space before and after the word in
like this ' in '
:
df['split'] = df.sentences.apply(lambda x: " in ".join(x.split(" in ", 1)[:1]))
Output:
Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.