[英]How can I split at word boundaries with regexes?
I'm trying to do this:我正在尝试这样做:
import re
sentence = "How are you?"
print(re.split(r'\b', sentence))
The result being结果是
[u'How are you?']
I want something like [u'How', u'are', u'you', u'?']
.我想要
[u'How', u'are', u'you', u'?']
东西。 How can this be achieved?如何做到这一点?
Unfortunately, Python cannot split by empty strings. 不幸的是,Python无法通过空字符串进行拆分。
To get around this, you would need to use findall
instead of split
. 要解决这个问题,您需要使用
findall
而不是split
。
Actually \\b
just means word boundary. 实际上
\\b
只是意味着词边界。
It is equivalent to (?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)
. 它相当于
(?<=\\w)(?=\\W)|(?<=\\W)(?=\\w)
。
That means, the following code would work: 这意味着,以下代码将起作用:
import re
sentence = "How are you?"
print(re.findall(r'\w+|\W+', sentence))
import re
split = re.findall(r"[\w']+|[.,!?;]", "How are you?")
print(split)
Output: 输出:
['How', 'are', 'you', '?']
Regex Explanation: 正则表达式说明:
"[\w']+|[.,!?;]"
1st Alternative: [\w']+
[\w']+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
' the literal character '
2nd Alternative: [.,!?;]
[.,!?;] match a single character present in the list below
.,!?; a single character in the list .,!?; literally
Here is my approach to split
on word boundaries:这是我在单词边界上
split
的方法:
re.split(r"\b\W\b", "How are you?") # Reprocess list to split on special characters.
# Result: ['How', 'are', 'you?']
and using findall
on word boundaries并在单词边界上使用
findall
re.findall(r"\b\w+\b", "How are you?")
# Result: ['How', 'are', 'you']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.