简体   繁体   English

正则表达式中“(?u)”的作用是什么?

[英]What does “(?u)” do in a regex?

I looked into how tokenization is implemented in scikit-learn and found this regex ( source ): 我研究了如何在scikit-learn中实现标记化并找到了这个正则表达式( 源代码 ):

token_pattern = r"(?u)\b\w\w+\b"

The regex is pretty straightforward but I have never seen the (?u) part before. 正则表达式非常简单,但我以前从未见过(?u)部分。 Can someone explain me what this part is doing? 有人可以解释一下这部分是做什么的吗?

It switches on the re.U ( re.UNICODE ) flag for this expression. 它打开此表达式的re.Ure.UNICODE )标志

From the module documentation : 模块文档

(?iLmsux)

(One or more letters from the set 'i' , 'L' , 'm' , 's' , 'u' , 'x' .) The group matches the empty string; (来自集合'i''L''m''s''u''x'一个或多个字母。)该组匹配空字符串; the letters set the corresponding flags: re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode dependent), and re.X (verbose), for the entire regular expression. 字母设置相应的标志: re.I (忽略大小写), re.L (依赖于语言环境), re.M (多行), re.S (点匹配所有), re.U (取决于Unicode),以及re.X (详细),用于整个正则表达式。 (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function. (标志在模块内容中描述。)如果您希望将标志包含在正则表达式的一部分中,而不是将标志参数传递给re.compile()函数,这将非常有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM