简体   繁体   English

Unicode 正则表达式:编译失败:字符类中的范围乱序

[英]Unicode Regular Expression: Compilation failed: range out of order in character class

I converted a regular expression taken from https://twemoji.maxcdn.com/v/latest/twemoji.js that matches the Unicode characters related to emojis from javascript to php.我将取自https://twemoji.maxcdn.com/v/latest/twemoji.js的正则表达式转换为匹配与表情符号相关的 Unicode 字符从 javascript 到 php。

The converted regex works as intended when I'm testing it with regex101.com当我使用 regex101.com 对其进行测试时,转换后的正则表达式按预期工作

However when I test in my local environment its not working.但是,当我在本地环境中测试时,它不起作用。

You can see a working example here https://regex101.com/r/IuIhBF/1你可以在这里看到一个工作示例https://regex101.com/r/IuIhBF/1

Here is the PHP version.这是PHP版本。 http://sandbox.onlinephpfunctions.com/code/3bd5933f5230fc1c45104b7eccd9379b68870016 http://sandbox.onlinephpfunctions.com/code/3bd5933f5230fc1c45104b7eccd9379b68870016

I tried changing the preg_match_all flags.我尝试更改 preg_match_all 标志。 Adding u to the regular expression ex: /*****/u将 u 添加到正则表达式中,例如:/*****/u

Can't get it to work无法让它工作

Would be great if somebody could help me solve that error: Compilation failed: range out of order in character class at offset 306.如果有人可以帮助我解决该错误,那就太好了:编译失败:偏移量 306 处字符类中的范围乱序。

This expression seems to be working on your samples, with a u flag:这个表达式似乎正在处理您的样本,带有u标志:

$re = '/[\x{1f300}-\x{1f5ff}\x{1f900}-\x{1f9ff}\x{1f600}-\x{1f64f}\x{1f680}-\x{1f6ff}\x{2600}-\x{26ff}\x{2700}-\x{27bf}\x{1f1e6}-\x{1f1ff}\x{1f191}-\x{1f251}\x{1f004}\x{1f0cf}\x{1f170}-\x{1f171}\x{1f17e}-\x{1f17f}\x{1f18e}\x{3030}\x{2b50}\x{2b55}\x{2934}-\x{2935}\x{2b05}-\x{2b07}\x{2b1b}-\x{2b1c}\x{3297}\x{3299}\x{303d}\x{00a9}\x{00ae}\x{2122}\x{23f3}\x{24c2}\x{23e9}-\x{23ef}\x{25b6}\x{23f8}-\x{23fa}]/u';
$str = 'Time in emoji is very expressive. 🕐🕑🕒🕓🕔🕕🕖🕗🕘🕙🕚🕛🕜🕝🕞🕟🕠🕡🕢🕣🕤🕥🕦🕧 allowed us to communicate time very easily.

Next up was negation. ❌🗣️ means “No talk.”';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

var_dump($matches);

The expression is explained on the top right panel of regex101.com , if you wish to explore/simplify/modify it, and in this link , you can watch how it would match against some sample inputs, if you like.该表达式在regex101.com 的右上角面板中进行了解释,如果您希望探索/简化/修改它,并且在此链接中,您可以观看它如何与某些示例输入匹配,如果您愿意的话。

Reference参考

How do i match with regex special chars that are not alphanumeric whilst ignoring emojis? 我如何在忽略表情符号的同时匹配非字母数字的正则表达式特殊字符?

For emoji , you should use Utf-16 surrogate pairs regex.对于emoji ,您应该使用 Utf-16 代理对正则表达式。
The utf-8/32 regex is way too slow. utf-8/32 正则表达式太慢了。

See this link for a Unicode Version 12 emoji regex and test.请参阅此链接以获取 Unicode 版本 12 表情符号正则表达式和测试。
It takes 3.4 seconds, so if it times out (default is 2s), just up the timeout需要 3.4 秒,所以如果超时(默认为 2 秒),只需超时
in the settings.在设置中。

The utf-8/32 regex takes almost 40 seconds by comparison (requires the //u flag).相比之下,utf-8/32 正则表达式需要将近 40 秒(需要 //u 标志)。

So, definitely stick with surrogate pairs for emoji regex.所以,一定要坚持使用 emoji regex 的代理对。

https://regex101.com/r/k61Df5/1 https://regex101.com/r/k61Df5/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM