简体   繁体   English

PHP正则表达式解决方案-删除一些特殊字符并用文本替换一些

[英]PHP regex solution - Remove some special characters and replace some with text

I have a PHP variable, say 我有一个PHP变量,比如

$myvariable = "te    xt!@ na#@)(me+=&t^*ext?>;.'na^%me";

I want to replace special characters and group of special characters including blank space with a single underscore _ . 我想用单个下划线_替换特殊字符和特殊字符组(包括空格)。 The string may contain & and it may be replaced with and . 该字符串可以包含& ,并且可以用and代替。

The result of previous variable should be; 前一个变量的结果应为;

te_xt_na_me_andt_ext_na_me

How can I do this in PHP? 如何在PHP中做到这一点?

This assumes, anything but "characters" is regarded disposable. 假设 “字符”之外的任何东西都视为一次性的。

$patterns = array(
    '/&/'             => 'and',  // Ampersand to "and"
    '/[^[:alpha:]]+/' => '_'     // Anything *but* a character to underscore
);

$result = preg_replace(array_keys($patterns), array_values($patterns), $input);

The last pattern replaces groups of one or more occurences of "non-word" characters according to the current locale 1 (and thus including white-space). 最后一个模式根据当前语言环境1 (因此包括空白)替换一个或多个出现的“非单词”字符的组。


1 Side-note (might be irrelevant): if the server the script runs on has en_US as locale, the following replacements occur: 1旁注(可能不相关):如果运行脚本的服务器的语言环境为en_US ,则会发生以下替换:

$input = 'app!le___s &!   orän=%ges';
$result = 'app_le_s_and_or_n_ges';

If the locale is de_DE , this would be the result: 如果语言环境是de_DE ,则结果如下:

$result = 'app_le_s_and_orän_ges';

Because ä is part of [[:alpha:]] in this particular locale. 因为ä是此特定语言环境中[[:alpha:]]一部分。 The obvious solution to circumvent this would be to substitute the character class for [a-zA-Z] . 避免此问题的明显解决方案是用字符类替换[a-zA-Z]

this should do it: 这应该做到这一点:

$myvariable = str_replace('&','and',$myvariable)
$myvariable = preg_replace ('/[^a-z]+/i', '_' , $myvariable)

see: http://php.net/manual/de/function.preg-replace.php 请参阅: http//php.net/manual/de/function.preg-replace.php

the caret (^) inside the squared brackets means to look for everything, that is not declared in the brackets. 方括号内的脱字号(^)表示查找所有未在方括号中声明的内容。 So every special character is not "az". 因此,每个特殊字符都不是“ az”。 The plus signalises, that multiple occurences should be matched. 加号表示应该多次匹配。 The 'i' behind the delimiting slash means to do a case-insensitive search. 分隔斜杠后面的“ i”表示不区分大小写。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM