正则表达式，用于验证和清除PHP中的所有英语和非英语unicode字母字符

Question

While there have been many questions regarding the non-english characters regex issue I have not been able to find a working answer. 尽管有很多关于非英语字符正则表达式的问题，但我仍然找不到有效的答案。 Moreover, there does not seem to be any simple PHP library which would help me to filter non-english input. 而且，似乎没有任何简单的PHP库可以帮助我过滤非英语输入。

Could you please suggest me a regular expression which would allow 你能建议我一个正则表达式吗

all english alphabet characters (abc...) 所有英文字母字符（abc ...）
all non-english alphabet characters (šýüčá...) 所有非英语字母字符（šýüčá...）
spaces 空间
case insensitive 不区分大小写

in validation as well as sanitization. 在验证以及消毒方面。 Essentially, I want either preg_match to return false when the input contains anything else than the 4 points above or preg_replace to get rid of everything except these 4 categories. 本质上，当输入包含上面4个点之外的任何内容时，我希望preg_match返回false，或者让preg_replace摆脱除这4个类别之外的所有内容。

I was able to create '/^((\\p{L}\\p{M}*)|(\\p{Cc})|(\\p{Z}))+$/ui' from http://www.regular-expressions.info/unicode.html . 我能够从http：// www创建'/^((\\p{L}\\p{M}*)|(\\p{Cc})|(\\p{Z}))+$/ui' .regular-expressions.info / unicode.html 。 This regular expression works well when validating input but not when sanitizing it. 此正则表达式在验证输入时很有效，但在清理输入时效果不佳。

EDIT: 编辑：

User enters 'český [jazyk]' as an input. 用户输入“český[jazyk]”作为输入。 Using '/^[\\p{L}\\p{Zs}]+$/u' in preg_match, the script determines that the string contains unallowed characters (in this case '[' and ']'). 在preg_match中使用'/^[\\p{L}\\p{Zs}]+$/u' ，脚本确定字符串包含不允许的字符（在这种情况下为'['和']'）。 Next I would like to use preg_replace, to delete those unwanted characters. 接下来，我想使用preg_replace删除那些不需要的字符。 What regular expression should I pass into preg_replace to match all characters that are not specified by the regular expression stated above? 我应该将什么正则表达式传递给preg_replace才能匹配上述正则表达式未指定的所有字符？

Answer 1

I think all you need is a character class like: 我认为您所需要的只是一个字符类，例如：

^[\p{L}\p{Zs}]+$

It means: The whole string (or line, with (?m) option) can only contain Unicode letters or spaces. 这意味着：整个字符串（或带(?m)选项的行）只能包含Unicode字母或空格。

Have a look at the demo . 看一下演示。

$re = "/^[\\p{L}\\p{Zs}]+$/um"; 
$str = "all english alphabet characters (abc...)\nall non-english alphabet characters (šýüčá...)\nspace s\nšýüčá šýüčá šýüčá ddd\nšýüčá eee 4\ncase insensitive"; 
preg_match_all($re, $str, $matches);

To remove all symbols that are not Unicode letters or spaces, use this code: 要删除所有不是Unicode字母或空格的符号，请使用以下代码：

$re = "/[^\\p{L}\\p{Zs}]+/u"; 
$str = "český [jazyk]"; 
echo preg_replace($re, "", $str);

The output of the sample program : 示例程序的输出：

český jazyk

正则表达式，用于验证和清除PHP中的所有英语和非英语unicode字母字符

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-04-23 08:41:17

正则表达式，用于验证和清除PHP中的所有英语和非英语unicode字母字符

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-04-23 08:41:17

解决方案1
3 已采纳 2015-04-23 08:41:17