简体   繁体   English

正则表达式检查带有或不带有重音符号的大写或小写字符

[英]Regex that checks upper or lower case characters with or without accents

How can I make the following regular expression ignore all whitespaces? 如何使以下正则表达式忽略所有空格?

$foo = ereg_replace("[^áéíóúÁÉÍÓÚñÑa-zA-Z]", "", $_REQUEST["bar"]);

Input: Ingeniería Eléctrica'*;<42 输入: IngenieríaEléctrica'*; <42

Current Output: IngenieríaEléctrica 当前输出: IngenieríaEléctrica

Desired Output: Ingeniería Eléctrica 期望的输出: IngenieríaEléctrica

I tried adding /s \\s\\s* \\s+ /\\s+/ /s /t /r among others and they all failed. 我尝试添加/ s \\ s \\ s * \\ s + / \\ s + / / s / t / r等,但它们都失败了。

Objective: A regex that will accept only strings with upper or lower case characters with or without (spanish) accents. 目标:一个正则表达式将只接受带有或不带有(西班牙)重音的大小写字符的字符串。

Thank you ! 谢谢 !

I see no reason as to why adding \\s to that regex would not work. 我看不出为什么在该正则表达式中添加\\s不起作用。 \\s should match all whitespace characters. \\s应该与所有空格字符匹配。

$foo = preg_replace("/[^áéíóúÁÉÍÓÚñÑa-zA-Z\s]/", "", $_REQUEST["bar"]);

我相信这应该有用

$foo = ereg_replace("[^áéíóúÁÉÍÓÚñÑa-zA-Z ]", "", $_REQUEST["bar"]);

ereg_replace uses POSIX Extended Regular Expressions and there, POSIX bracket expressions are used. ereg_replace使用POSIX扩展正则表达式,并且在那里使用POSIX括号表达式

Now the important thing to know is that inside bracket expressions, \\ is not a meta-character and therefore \\s won't work. 现在要知道的重要一点是,在方括号表达式中, \\ 不是元字符,因此\\s将不起作用。

But you can use the POSIX character class [:space:] inside the POSIX bracket expression to achieve the same effect: 但是您可以在POSIX方括号表达式内使用POSIX字符类[:space:]来达到相同的效果:

$foo = ereg_replace("[^áéíóúÁÉÍÓÚñÑa-zA-Z[:space:]]", "", $_REQUEST["bar"]);

You see, it is different from the, I think, better known Perl syntax and as the POSIX regular expression functions are deprecated in PHP 5.3 you really should go with the Perl compatible ones. 您会发现,它与众所周知的Perl语法不同,并且随着POSIX正则表达式函数在PHP 5.3中已弃用,您确实应该使用与Perl兼容的函数。

All the answers so far fail to point out that your method to match the accentuated characters is a hack and it's incomplete – for instance, no grave accents are matched. 到目前为止,所有答案都未能指出,匹配重音符号的方法是一种技巧,而且不完整-例如,没有重音符匹配。

The best way is to use the mbstring extension: 最好的方法是使用mbstring扩展名:

mb_regex_encoding("UTF-8"); //or whatever encoding you're using
var_dump(mb_ereg_replace("[^\\w\\s]|[0-9]", "", "Ingeniería Eléctrica'*;<42", "z"));

gives

string(22) "Ingeniería Eléctrica"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM