在python中提取特定字符串之前的2個單詞，實際單詞和2個字符串？

Question

我有熊貓系列

       Explanation 

a      "how are you doing today where is she going" 
b      "do you like blueberry ice cream does not make sure " 
c      "this works but you know that the translation is on"

我想提取字符串“you”前后的2個單詞

例如，我希望它像

        Explanation                                                    Explanation Extracted

a      "how are you doing today where is she going"                  "how are you doing today"
b      "do you like blueberry ice cream does not make sure "         do you like blueberry ice 
c      "this works but you know that the translation is on"           "work but you know that"

這個正則表達式給了我“你”之前和之后的兩個詞，但不包括“你”本身

(?P<before>(?:\w+\W+){,2})you\W+(?P<after>(?:\w+\W+){,2})

如何更改它，以便我可以包含“你”

Answer 1

您可以使用

df['Explanation Extracted'] = df['Explanation'].str.extract(r'\b((?:\w+\W+){0,2}you\b(?:\W+\w+){0,2})', expand=False)

請參閱正則表達式演示。

詳情：

\b - 單詞邊界
(?:\w+\W+){0,2} - 零次、一次或兩次出現一個或多個單詞字符，然后是一個或多個非單詞字符
you - you的字符串
\b - 單詞邊界
(?:\W+\w+){0,2} - 零次、一次或兩次出現一個或多個非單詞字符，然后是一個或多個單詞字符。

熊貓測試：

>>> import pandas as pd
>>> df = pd.DataFrame({'Explanation':["how are you doing today where is she going", "do you like blueberry ice cream does not make sure ", "this works but you know that the translation is on"]})
>>> df['Explanation Extracted'] = df['Explanation'].str.extract(r'\b((?:\w+\W+){0,2}you\b(?:\W+\w+){0,2})', expand=False)
>>> df
                                         Explanation    Explanation Extracted
0         how are you doing today where is she going  how are you doing today
1  do you like blueberry ice cream does not make ...    do you like blueberry
2  this works but you know that the translation i...  works but you know that

Answer 2

我將展示一種沒有正則表達式和熊貓的方法，對於這種情況，我認為不需要它。

text1 = "how are you doing today where is she going"
text2 = "do you like blueberry ice cream does not make sure "
text3 = "this works but you know that the translation is on"


def show_trunc_sentence(text, word='you'): # here you can choose another word besides you but you is the default
    word_loc = int(text.split().index('you'))
    num = [word_loc - 2 if word_loc - 2 >= 0 else 0]
    num = int(num[0])
    before = text.split()[num: word_loc + 1]
    after = text.split()[word_loc + 1:word_loc + 3]
    print(" ".join(before + after))


    show_trunc_sentence(text2)

輸出： text1 - 你今天過得怎么樣 text2 - 你喜歡藍莓嗎 text3 - 有效，但你知道

在python中提取特定字符串之前的2個單詞，實際單詞和2個字符串？

問題描述

2 個解決方案

解決方案1
0 2022-06-29 23:22:25

解決方案2
0 2022-06-29 23:57:51

在python中提取特定字符串之前的2個單詞，實際單詞和2個字符串？

問題描述

2 個解決方案

解決方案1 0 2022-06-29 23:22:25

解決方案2 0 2022-06-29 23:57:51

解決方案1
0 2022-06-29 23:22:25

解決方案2
0 2022-06-29 23:57:51