简体   繁体   English

在RegEx中接受国际名称字符

[英]Accept international name characters in RegEx

I've always struggled with RegEx so forgive me if this may seem like an awful approach at tackling my problem. 我一直在与RegEx挣扎,所以请原谅我,如果这看起来像是一个解决我的问题的可怕方法。

When users are entering first and last names I started off just using the basic, check for upper and lower case, white space, apostrophes and hyphens 当用户输入名字和姓氏时,我开始只使用基本,检查大小写,空格,撇号和连字符

if (!preg_match("/^[a-zA-Z\s'-]+$/", $name)) { // Error }

Now I realise this isn't the best since people could have things such as: Dr. Martin Luther King, Jr. (with comma's and fullstops). 现在我意识到这不是最好的,因为人们可以拥有如下的东西:马丁路德金博士(用逗号和全文)。 So I assume by changing it to this would make it slightly more effective. 所以我认为通过改变它会使它稍微有效。

if (!preg_match("/^[a-zA-Z\s,.'-]+$/", $name)) { // Error }

I then saw a girls name I know on my Facebook who writes her name as Siân, which got me thinking of names which contain umlauts as well as say Japanese/Chinese/Korean/Russian characters too. 然后我在Facebook上看到了一个我知道的女孩名字,她把自己的名字写成了Siân,这让我想到了包含变音符号以及日语/中文/韩语/俄语字符的名字。 So I started searching and found ways by writing each of these characters in there like so. 所以我开始搜索并通过在其中写下每个字符来找到方法。

if (!preg_match("/^[a-zA-Z\sàáâäãåèéêëìíîïòóôöõøùúûüÿýñçčšžÀÁÂÄÃÅÈÉÊËÌÍÎÏÒÓÔÖÕØÙÚÛÜŸÝÑßÇŒÆČŠŽ∂ð ,.'-]+$/u", $first_name)) { // Error }

As you can imagine, it's extremely long winded and I'm pretty certain there is a much simpler RegEx which can achieve this. 你可以想象,这是一个非常长的啰嗦,我很确定有一个更简单的RegEx可以实现这一目标。 Like I've said, I've searched around but this is the best I can do. 就像我说的那样,我已经四处寻找,但这是我能做的最好的事情。

So, what is a good way to check for upper and lower case characters, commas, full stops, apostrophes, hypens, umlauts, Latin, Japanese/Russian etc 那么,检查大小写字符,逗号,句号,撇号,大肆,变音符号,拉丁语,日语/俄语等的好方法是什么?

You can use an Unicode character class. 您可以使用Unicode字符类。 \\pL covers pretty much all letter symbols. \\pL涵盖几乎所有字母符号。
http://php.net/manual/en/regexp.reference.unicode.php http://php.net/manual/en/regexp.reference.unicode.php

 if (!preg_match("/^[a-zA-Z\s,.'-\pL]+$/u", $name))

See also http://www.regular-expressions.info/unicode.html , but beware that PHP/PCRE only understands the abbreviated class names. 另请参见http://www.regular-expressions.info/unicode.html ,但要注意PHP / PCRE只能理解缩写的类名。

\\pL already includes az and AZ , therefore the mentioned pattern "/^[a-zA-Z\\s,.'-\\pL]+$/u" could be simplified to \\pL已经包含了azAZ ,因此上面提到的模式"/^[a-zA-Z\\s,.'-\\pL]+$/u"可以简化为

"/^[\\s,.'-\\pL]+$/"

also the modifier u is not required. 也不需要修饰符u

There could probably be some loosening of the qualifications by allowing other types of punctuation. 通过允许其他类型的标点符号,可能会有一些放松的资格。

One thing that should be a restriction is requiring at least one letter. 应该限制​​的一件事是至少需要一个字母。

if (!preg_match("/^[\\s,.'-]*\\p{L}[\\p{L}\\s,.'-]*$/u", $name))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM