简体   繁体   English

如何在 PHP 中使用 RegEx 去除特定的前导和尾随标点符号

[英]How to use RegEx to strip specific leading and trailing punctuation in PHP

We're scrubbing a ridiculous amount of data, and am finding many examples of clean data that are left with irrelevant punctuation at the beginning and end of the final string.我们正在清理大量数据,并且发现许多干净数据的示例在最终字符串的开头和结尾留下了不相关的标点符号。 Quotes and DoubleQuotes are fine, but leading/trailing dashes, commas, etc need to be removed Quotes 和 DoubleQuotes 很好,但需要删除前导/尾随破折号、逗号等

I've studied the answer at How can I remove all leading and trailing punctuation?我已经研究了如何删除所有前导和尾随标点符号的答案 , but am unable to find a way to accomplish the same in PHP. ,但我无法找到在 PHP 中完成相同操作的方法。

- some text.                dash and period should be removed
"Some Other Text".          period should be removed
it's a matter of opinion    apostrophe should be kept
/ some more text?           Slash should be removed and question mark kept

In short,简而言之,

  • Certain punctuation occurring BEFORE the first AlphaNumeric character must be removed必须删除第一个字母数字字符之前出现的某些标点符号
  • Certain punctuation occurring AFTER the last AlphaNumeric character must be removed必须删除最后一个字母数字字符之后出现的某些标点符号

How can I accomplish this with PHP - the few examples I've found surpass my RegEx/JS abilites.我怎样才能用 PHP 实现这一点——我发现的几个例子超过了我的 RegEx/JS 能力。

This is an answer without regex.这是一个没有正则表达式的答案。

You can use the function trim (or a combination of ltrim / rtrim to specify all characters you want to remove. For your example:您可以使用功能trim (或ltrim / rtrim的组合来指定要删除的所有字符。例如:

$str = trim($str, " \t\n\r\0\x0B-.");

(As I suppose you also want to remove spacing and newlines at the begin/end, I left the default mask) (我想你还想在开始/结束时删除间距和换行符,我保留了默认掩码)

See also rtrim and ltrim if you don't want to remove the same charlist at the beginning and the end of your strings.如果您不想在字符串的开头和结尾删除相同的字符列表,另请参阅 rtrim 和 ltrim。

You can modify the pattern to include characters.您可以修改模式以包含字符。

$array = array(
    '- some text.',
    '"Some Other Text".',
    'it\'s a matter of opinion',
    '/ some more text?'
);

foreach($array as $key => $string){
    $array[$key] = preg_replace(array(
        '/^[\.\-\/]*/',
        '/[\.\-\/]*$/'
    ), array('', ''), $string);
}

print_r($array);

If the punctuation could be more than one character, you could do this如果标点符号可能不止一个字符,你可以这样做

function trimFormatting($str){ // trim 
    $osl = 0;
    $pat = '(<br>|,|\s+)';
    while($osl!==strlen($str)){
        $osl = strlen($str);
        $str =preg_replace('/^'.$pat.'|'.$pat.'$/i','',$str); 
    }
return $str;
}
echo trimFormatting('<BR>,<BR>Hello<BR>World<BR>, <BR>'); 

// will give "Hello<BR>World"

The routine checks for "<BR>" and "," and one or spaces ("\\s+").该例程检查“<BR>”和“,”以及一个或空格(“\\s+”)。 The "|" “|” being the OR operator used three times in the routine.作为 OR 运算符在例程中使用了 3 次。 It trims both at the start "^" and the end "$" at the same time.它同时在开头“^”和结尾“$”处进行修剪。 It keeps looping through this until no more matches are trimmed off (ie there is no further reduction in string length).它不断循环直到没有更多的匹配被剪掉(即字符串长度不再减少)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM