重音字符的PHP正则表达式

Question

I try to filter a variable allowing alphanumeric ,spaces ,accented characters , and single quotes and replace the reste by a space , so a string like : 我尝试过滤一个允许字母数字，空格，带重音符号和单引号的变量，并用空格替换reste，这样的字符串如下：

substitué à une otage % ? 替代ot％吗？ vendredi 23 mars lors de l'attaque vendredi 23火星勒阿塔克

should output : 应该输出：

substitué à une otage vendredi 23 mars lors de l'attaque 替代火星在23 Mars lors de l'attaque

but I get as Result the output : 但是我得到的结果是：

substitué à une otage vendredi 23 mars lors de l 替代人在马尔斯·洛斯23

could please help , this is my code 可以帮忙，这是我的代码

$whitelist = "/[^a-zA-Z0-9а-àâáçéèèêëìîíïôòóùûüÂÊÎÔúÛÄËÏÖÜÀÆæÇÉÈŒœÙñý',. ]/";

$descreption =  preg_replace($whitelist, ' ', $ds);
}else{
    $errors = self::DESCREPTION_ERROR;
    return false;
}

Answer 1

Your regex is faulty. 您的正则表达式有问题。 The part а-à gives the error Character range is out of order - I guess the - was added by mistake there... а-à部分给出错误Character range is out of order -我猜是-错误地添加了...

Then a small hint: ' is not ' 然后有一个小提示： '不是'

[^a-zA-Z0-9àâáçéèèêëìîíïôòóùûüÂÊÎÔúÛÄËÏÖÜÀÆæÇÉÈŒœÙñý'’,. ]

should work fine. 应该工作正常。

Also, if you're working with Regex, tools like RegExr or regex101 are really a nice thing. 另外，如果您使用Regex，则RegExr或regex101之类的工具确实是一件好事。

Answer 2

One way to deal with the range of accented characters is to use the POSIX [:alnum:] class, which in PHP in conjunction with the u modifier will match all of them. 处理重音字符范围的一种方法是使用POSIX [:alnum:]类，该类在PHP中与u修饰符一起将它们全部匹配。 That can then be put into a negated character class with the other characters you want to keep to allow the other characters to be removed: 然后可以将其与要保留的其他字符一起放入否定的字符类中，以允许删除其他字符：

$string = 'substitué à une otage % ? vendredi 23 mars lors de l’attaque';
echo preg_replace("/[^[:alnum:]'’,.]/u", ' ', $string);

Output: 输出：

substitué à une otage vendredi 23 mars lors de l’attaque

As has been pointed out in the comments, ' is not the same as ' and so it also needs to be added to the set of characters you want to keep. 正如在评论中已经指出的那样， '是不一样的' ，所以它也需要被添加到设置要保留的字符。

Demo on 3v4l.org 3v4l.org上的演示

Answer 3

You may have a look at Unicode character properties . 您可以看看Unicode字符属性。

Summary of my changes: 我的变更摘要：

use \\p{L} to match all letters 使用\\p{L}来匹配所有字母
escape the hyphen ( \\- ) 转义连字符（ \\- ）
support typewriter ( ' ) and typographic ( ' ) apostrophes 支持打字机（ ' ）和印刷（ ' ）撇号

Here is the result: 结果如下：

$whitelist = '/[^\p{L}0-9\-\'’,. ]/u';

There is probably room for even further improvement. 可能还有进一步改进的空间。 Finally, don't forget to add the u modifier ! 最后，不要忘记添加u 修饰符！

重音字符的PHP正则表达式

问题描述

3 个解决方案

解决方案1
2 已采纳 2019-01-03 08:55:24

解决方案2
1 2019-01-03 11:25:51

解决方案3
0 2019-01-03 11:59:29

重音字符的PHP正则表达式

问题描述

3 个解决方案

解决方案1 2 已采纳 2019-01-03 08:55:24

解决方案2 1 2019-01-03 11:25:51

解决方案3 0 2019-01-03 11:59:29

解决方案1
2 已采纳 2019-01-03 08:55:24

解决方案2
1 2019-01-03 11:25:51

解决方案3
0 2019-01-03 11:59:29