[英]extracting the 2 words before, the actual word, and the 2 strings after a specific string in python?
我有熊貓系列
Explanation
a "how are you doing today where is she going"
b "do you like blueberry ice cream does not make sure "
c "this works but you know that the translation is on"
我想提取字符串“you”前后的2個單詞
例如,我希望它像
Explanation Explanation Extracted
a "how are you doing today where is she going" "how are you doing today"
b "do you like blueberry ice cream does not make sure " do you like blueberry ice
c "this works but you know that the translation is on" "work but you know that"
這個正則表達式給了我“你”之前和之后的兩個詞,但不包括“你”本身
(?P<before>(?:\w+\W+){,2})you\W+(?P<after>(?:\w+\W+){,2})
如何更改它,以便我可以包含“你”
您可以使用
df['Explanation Extracted'] = df['Explanation'].str.extract(r'\b((?:\w+\W+){0,2}you\b(?:\W+\w+){0,2})', expand=False)
請參閱正則表達式演示。
詳情:
\b
- 單詞邊界(?:\w+\W+){0,2}
- 零次、一次或兩次出現一個或多個單詞字符,然后是一個或多個非單詞字符you
- you
的字符串\b
- 單詞邊界(?:\W+\w+){0,2}
- 零次、一次或兩次出現一個或多個非單詞字符,然后是一個或多個單詞字符。熊貓測試:
>>> import pandas as pd
>>> df = pd.DataFrame({'Explanation':["how are you doing today where is she going", "do you like blueberry ice cream does not make sure ", "this works but you know that the translation is on"]})
>>> df['Explanation Extracted'] = df['Explanation'].str.extract(r'\b((?:\w+\W+){0,2}you\b(?:\W+\w+){0,2})', expand=False)
>>> df
Explanation Explanation Extracted
0 how are you doing today where is she going how are you doing today
1 do you like blueberry ice cream does not make ... do you like blueberry
2 this works but you know that the translation i... works but you know that
我將展示一種沒有正則表達式和熊貓的方法,對於這種情況,我認為不需要它。
text1 = "how are you doing today where is she going"
text2 = "do you like blueberry ice cream does not make sure "
text3 = "this works but you know that the translation is on"
def show_trunc_sentence(text, word='you'): # here you can choose another word besides you but you is the default
word_loc = int(text.split().index('you'))
num = [word_loc - 2 if word_loc - 2 >= 0 else 0]
num = int(num[0])
before = text.split()[num: word_loc + 1]
after = text.split()[word_loc + 1:word_loc + 3]
print(" ".join(before + after))
show_trunc_sentence(text2)
輸出: text1 - 你今天過得怎么樣 text2 - 你喜歡藍莓嗎 text3 - 有效,但你知道
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.