[英]What does “(?u)” do in a regex?
I looked into how tokenization is implemented in scikit-learn and found this regex ( source ): 我研究了如何在scikit-learn中实现标记化并找到了这个正则表达式( 源代码 ):
token_pattern = r"(?u)\b\w\w+\b"
The regex is pretty straightforward but I have never seen the (?u)
part before. 正则表达式非常简单,但我以前从未见过
(?u)
部分。 Can someone explain me what this part is doing? 有人可以解释一下这部分是做什么的吗?
It switches on the re.U
( re.UNICODE
) flag for this expression. 它打开此表达式的
re.U
( re.UNICODE
)标志 。
From the module documentation : 从模块文档 :
(?iLmsux)
(One or more letters from the set
'i'
,'L'
,'m'
,'s'
,'u'
,'x'
.) The group matches the empty string;(来自集合
'i'
,'L'
,'m'
,'s'
,'u'
,'x'
一个或多个字母。)该组匹配空字符串; the letters set the corresponding flags:re.I
(ignore case),re.L
(locale dependent),re.M
(multi-line),re.S
(dot matches all),re.U
(Unicode dependent), andre.X
(verbose), for the entire regular expression.字母设置相应的标志:
re.I
(忽略大小写),re.L
(依赖于语言环境),re.M
(多行),re.S
(点匹配所有),re.U
(取决于Unicode),以及re.X
(详细),用于整个正则表达式。 (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to there.compile()
function.(标志在模块内容中描述。)如果您希望将标志包含在正则表达式的一部分中,而不是将标志参数传递给
re.compile()
函数,这将非常有用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.