在熊貓中的特定字符后刪除句子的一部分

Question

我在熊貓專欄中有很多句子。 在每個句子中，我都試圖刪除單詞“ in”后的一部分句子。 例：

當前形式：“瑪麗在她的房屋內有很多家具，她與父母和男友住在馬里蘭州”

期望的形式：“瑪麗在她的屋子里有很多家具，她與父母和男朋友住在一起”

我嘗試了多種解決方案，但是在每種情況下，即使在單詞“ in”內，任何句子在“ in”之后都會分開。 因此，目前，我的輸出是：“瑪麗有很多家具”。 那是因為里面的單詞包含字符串“ in”

這是我目前擁有的，並且沒有產生期望的輸出：

 df['split'] = df.sentences.apply(lambda x: "in".join(x.split("in", 1)[:1]))

任何幫助將不勝感激！

Answer 1

如果單詞in前后有空格，請使用str.split和split。

df['split'] = df['sentences'].str.split('\sin\s').str[0]

產量

0    Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend
Name: sentences, dtype: object

或使用Zachary在注釋中建議的單詞邊界：

df['split'] = df['sentences'].str.split(r'\bin\b').str[0]

Answer 2

您快到了，您只需要in ' in '這樣的單詞前后添加一個額外的空間即可：

df['split'] = df.sentences.apply(lambda x: " in ".join(x.split(" in ", 1)[:1]))

輸出：

Mary has a lot of furniture inside her house, where she lives with her parents and her boyfriend