正则表达式中“（？u）”的作用是什么？

Question

I looked into how tokenization is implemented in scikit-learn and found this regex ( source ): 我研究了如何在scikit-learn中实现标记化并找到了这个正则表达式（源代码）：

token_pattern = r"(?u)\b\w\w+\b"

The regex is pretty straightforward but I have never seen the (?u) part before. 正则表达式非常简单，但我以前从未见过(?u)部分。 Can someone explain me what this part is doing? 有人可以解释一下这部分是做什么的吗？

Answer 1

It switches on the re.U ( re.UNICODE ) flag for this expression. 它打开此表达式的re.U （ re.UNICODE ）标志。

From the module documentation : 从模块文档：

(?iLmsux)

(One or more letters from the set 'i' , 'L' , 'm' , 's' , 'u' , 'x' .) The group matches the empty string; （来自集合'i' ， 'L' ， 'm' ， 's' ， 'u' ， 'x'一个或多个字母。）该组匹配空字符串; the letters set the corresponding flags: re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), re.U (Unicode dependent), and re.X (verbose), for the entire regular expression. 字母设置相应的标志： re.I （忽略大小写）， re.L （依赖于语言环境）， re.M （多行）， re.S （点匹配所有）， re.U （取决于Unicode），以及re.X （详细），用于整个正则表达式。 (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function. （标志在模块内容中描述。）如果您希望将标志包含在正则表达式的一部分中，而不是将标志参数传递给re.compile()函数，这将非常有用。

正则表达式中“（？u）”的作用是什么？

问题描述

1 个解决方案

解决方案1
18 已采纳 2016-01-27 16:41:18

正则表达式中“（？u）”的作用是什么？

问题描述

1 个解决方案

解决方案1 18 已采纳 2016-01-27 16:41:18

解决方案1
18 已采纳 2016-01-27 16:41:18