简体   繁体   English

PHP正则表达式,用于匹配所有特殊字符,包括带重音符号的字符

[英]PHP regex for matching ALL special characters, included accented characters

I am looking for a way to match all the possible special characters in a string. 我正在寻找一种匹配字符串中所有可能的特殊字符的方法。 I have a list of cities in the world and many of the names of those cities contain special characters and accented characters. 我有一个世界城市列表,这些城市中的许多名称都包含特殊字符和重音字符。 So I am looking for a regular expression that will return TRUE for any kind of special characters. 因此,我正在寻找一个正则表达式,该表达式对于任何特殊字符都将返回TRUE。 All the ones I found only match some, but I need one for every possible special character out there, spaces at the begin of the string included. 我发现的所有字符都只与某些字符匹配,但是我需要为每个可能的特殊字符使用一个,在字符串的开头包含空格。 Is this possible? 这可能吗?

This is the one I found, but does not match all the different and possible characters I may encounter in the name of a city: 这是我找到的,但与我在城市名中可能遇到的所有不同和可能的字符都不匹配:

preg_match('/[#$%^&*()+=\-\[\]\';,.\/{}|":<>?~\\\\]/', $string);

You're going to need the UTF8 mode "#pattern#u": http://nl3.php.net/manual/en/reference.pcre.pattern.modifiers.php 您将需要UTF8模式“#pattern#u”: http : //nl3.php.net/manual/en/reference.pcre.pattern.modifiers.php

Then you can use the Unicode escape sequences: http://nl3.php.net/manual/en/regexp.reference.unicode.php 然后,您可以使用Unicode转义序列: http : //nl3.php.net/manual/en/regexp.reference.unicode.php

So that preg_match("#\\p{L}*#u", "København", $match) will match. 这样preg_match(“#\\ p {L} *#u”,“København”,$ match)将匹配。

Use unicode properties: 使用unicode属性:

\\pL stands for any letter \\pL代表任何字母

To match a city names, i'd do (I suppose - and space are valid characters) : 为了匹配城市名称,我愿意(我想-和空格是有效字符):

preg_match('/\s*[\pL-\s]/u', $string);

You can just reverse your pattern... to match everything what is not "a-Z09-_" you would use 您可以反转模式...以匹配所有您将使用的不是“ a-Z09-_”的内容

preg_match('/[^-_a-z0-9.]/iu', $string);

The ^ in the character class reverses it. 字符类中的^将其反转。

I had the same problem where I wanted to split nameparts which also contained special characters: 我有一个同样的问题,我想分割也包含特殊字符的名称部分:

For example if you want to split a bunch of names containing: 例如,如果要拆分一堆包含以下内容的名称:

<lastname>,<forename(s)> <initial(s)> <suffix(es)>

fornames and suffix are separated with (white)space(s) 姓氏和后缀之间用空格隔开
initials are separated with a . 首字母以分隔。 and with maximum of 6 initials 且最多有6个首字母缩写

you could use 你可以用

$nameparts=preg_split("/(\w*),((?:\w+[\s\-]*)*)((?:\w\.){1,6})(?:\s*)(.*)/u",$displayname,null,PREG_SPLIT_DELIM_CAPTURE);
//first and last part are always empty
array_splice($naamdelen, 5, 1);
array_splice($naamdelen, 0, 1);
print_r($nameparts);

Input: 输入:
Powers,Björn BA van der
Output: 输出:
Array ( [0] => Powers[1] => Björn [2] => BA [3] => van der)

Tip: the regular expression looks like from outer space but regex101.com to the rescue! 提示:正则表达式看起来像是从外太空到regex101.com一样!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM