簡體   English   中英

如何使用正則表達式在單詞邊界處拆分?

[英]How can I split at word boundaries with regexes?

我正在嘗試這樣做:

import re
sentence = "How are you?"
print(re.split(r'\b', sentence))

結果是

[u'How are you?']

我想要[u'How', u'are', u'you', u'?']東西。 如何做到這一點?

不幸的是,Python無法通過空字符串進行拆分。

要解決這個問題,您需要使用findall而不是split

實際上\\b只是意味着詞邊界。

它相當於(?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)

這意味着,以下代碼將起作用:

import re
sentence = "How are you?"
print(re.findall(r'\w+|\W+', sentence))
import re
split = re.findall(r"[\w']+|[.,!?;]", "How are you?")
print(split)

輸出:

['How', 'are', 'you', '?']

Ideone演示

Regex101演示


正則表達式說明:

"[\w']+|[.,!?;]"

    1st Alternative: [\w']+
        [\w']+ match a single character present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \w match any word character [a-zA-Z0-9_]
            ' the literal character '
    2nd Alternative: [.,!?;]
        [.,!?;] match a single character present in the list below
            .,!?; a single character in the list .,!?; literally

這是我在單詞邊界上split的方法:

re.split(r"\b\W\b", "How are you?") # Reprocess list to split on special characters.
# Result: ['How', 'are', 'you?']

並在單詞邊界上使用findall

re.findall(r"\b\w+\b", "How are you?")
# Result: ['How', 'are', 'you']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM