简体   繁体   English

如何删除非字母数字字符?

[英]How to remove non-alphanumeric characters?

I need to remove all characters from a string which aren't in az AZ 0-9 set or are not spaces.我需要从不在az AZ 0-9集中或不是空格的字符串中删除所有字符。

Does anyone have a function to do this?有没有人有 function 可以做到这一点?

Sounds like you almost knew what you wanted to do already, you basically defined it as a regex.听起来您几乎已经知道自己想要做什么,您基本上将其定义为正则表达式。

preg_replace("/[^A-Za-z0-9 ]/", '', $string);

For unicode characters, it is:对于 unicode 字符,它是:

preg_replace("/[^[:alnum:][:space:]]/u", '', $string);

Regular expression is your answer.正则表达式是你的答案。

$str = preg_replace('/[^a-z\d ]/i', '', $str);
  • The i stands for case insensitive. i代表不区分大小写。
  • ^ means, does not start with. ^表示,不以开头。
  • \d matches any digit. \d匹配任何数字。
  • az matches all characters between a and z . az匹配az之间的所有字符。 Because of the i parameter you don't have to specify az and AZ .由于i参数,您不必指定azAZ
  • After \d there is a space, so spaces are allowed in this regex. \d之后有一个空格,因此此正则表达式中允许有空格。

here's a really simple regex for that:这是一个非常简单的正则表达式:

\W|_

and used as you need it (with a forward / slash delimiter).并根据需要使用/使用正斜杠分隔符)。

preg_replace("/\W|_/", '', $string);

Test it here with this great tool that explains what the regex is doing:使用这个解释正则表达式在做什么的好工具在这里测试它:

http://www.regexr.com/ http://www.regexr.com/

[\W_]+

$string = preg_replace("/[\W_]+/u", '', $string);

It select all not AZ, az, 0-9 and delete it.它 select 都不是 AZ, az, 0-9 并删除它。

See example here: https://regexr.com/3h1rj请参阅此处的示例: https://regexr.com/3h1rj

If you need to support other languages, instead of the typical AZ, you can use the following:如果您需要支持其他语言,而不是典型的 AZ,您可以使用以下内容:

preg_replace('/[^\p{L}\p{N} ]+/', '', $string);
  • [^\p{L}\p{N} ] defines a negated (It will match a character that is not defined) character class of: [^\p{L}\p{N} ]定义了一个否定(它将匹配一个未定义的字符)字符 class:
    • \p{L} : a letter from any language. \p{L}任何语言的字母。
    • \p{N} : a numeric character in any script. \p{N}任何脚本中的数字字符。
    • : a space character. : 一个空格字符。
  • + greedily matches the character class between 1 and unlimited times. +贪婪地匹配字符 class 1 到无限次。

This will preserve letters and numbers from other languages and scripts as well as AZ:这将保留来自其他语言和脚本以及 AZ 的字母和数字:

preg_replace('/[^\p{L}\p{N} ]+/', '', 'hello-world'); // helloworld
preg_replace('/[^\p{L}\p{N} ]+/', '', 'abc@~#123-+=öäå'); // abc123öäå
preg_replace('/[^\p{L}\p{N} ]+/', '', '你好世界!@£$%^&*()'); // 你好世界

Note: This is a very old, but still relevant question.注意:这是一个非常古老但仍然相关的问题。 I am answering purely to provide supplementary information that may be useful to future visitors.我回答纯粹是为了提供可能对未来访问者有用的补充信息。

preg_replace("/\W+/", '', $string)

You can test it here: http://regexr.com/你可以在这里测试它: http://regexr.com/

I was looking for the answer too and my intention was to clean every non-alpha and there shouldn't have more than one space.我也在寻找答案,我的意图是清理每一个非 alpha 并且不应该有超过一个空间。
So, I modified Alex's answer to this, and this is working for me preg_replace('/[^az|\s+]+/i', ' ', $name)所以,我修改了亚历克斯对此的回答,这对我preg_replace('/[^az|\s+]+/i', ' ', $name)
The regex above turned sy8ed sirajul7_islam to sy ed sirajul islam上面的正则表达式将sy8ed sirajul7_islamsy ed sirajul islam
Explanation: regex will check NOT ANY from a to z in case insensitive way or more than one white spaces, and it will be converted to a single space.说明:正则表达式将检查NOT ANY from a to z 以不区分大小写或多个空格,并将其转换为单个空格。

You can split the string into characters and filter it.您可以将字符串拆分为字符并对其进行过滤。

<?php 

function filter_alphanum($string) {
    $characters = str_split($string);
    $alphaNumeric = array_filter($characters,"ctype_alnum");
    return join($alphaNumeric);
}

$res = filter_alphanum("a!bc!#123");
print_r($res); // abc123

?>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM