![](/img/trans.png)
[英]How can I use regexes to split strings by multiple delimiters, with a limit?
[英]How can I split at word boundaries with regexes?
我正在嘗試這樣做:
import re
sentence = "How are you?"
print(re.split(r'\b', sentence))
結果是
[u'How are you?']
我想要[u'How', u'are', u'you', u'?']
東西。 如何做到這一點?
不幸的是,Python無法通過空字符串進行拆分。
要解決這個問題,您需要使用findall
而不是split
。
實際上\\b
只是意味着詞邊界。
它相當於(?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)
。
這意味着,以下代碼將起作用:
import re
sentence = "How are you?"
print(re.findall(r'\w+|\W+', sentence))
import re
split = re.findall(r"[\w']+|[.,!?;]", "How are you?")
print(split)
輸出:
['How', 'are', 'you', '?']
正則表達式說明:
"[\w']+|[.,!?;]"
1st Alternative: [\w']+
[\w']+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
' the literal character '
2nd Alternative: [.,!?;]
[.,!?;] match a single character present in the list below
.,!?; a single character in the list .,!?; literally
這是我在單詞邊界上split
的方法:
re.split(r"\b\W\b", "How are you?") # Reprocess list to split on special characters.
# Result: ['How', 'are', 'you?']
並在單詞邊界上使用findall
re.findall(r"\b\w+\b", "How are you?")
# Result: ['How', 'are', 'you']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.