在熊猫中的特定字符后删除句子的一部分

Question

我在熊猫专栏中有很多句子。 在每个句子中，我都试图删除单词“ in”后的一部分句子。 例：

当前形式：“玛丽在她的房屋内有很多家具，她与父母和男友住在马里兰州”

期望的形式：“玛丽在她的屋子里有很多家具，她与父母和男朋友住在一起”

我尝试了多种解决方案，但是在每种情况下，即使在单词“ in”内，任何句子在“ in”之后都会分开。 因此，目前，我的输出是：“玛丽有很多家具”。 那是因为里面的单词包含字符串“ in”

这是我目前拥有的，并且没有产生期望的输出：

 df['split'] = df.sentences.apply(lambda x: "in".join(x.split("in", 1)[:1]))

任何帮助将不胜感激！

Answer 1

如果单词in前后有空格，请使用str.split和split。

df['split'] = df['sentences'].str.split('\sin\s').str[0]

产量

0    Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend
Name: sentences, dtype: object

或使用Zachary在注释中建议的单词边界：

df['split'] = df['sentences'].str.split(r'\bin\b').str[0]

Answer 2

您快到了，您只需要in ' in '这样的单词前后添加一个额外的空间即可：

df['split'] = df.sentences.apply(lambda x: " in ".join(x.split(" in ", 1)[:1]))

输出：

Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend