简体   繁体   中英

How to use RegEx to strip specific leading and trailing punctuation in PHP

We're scrubbing a ridiculous amount of data, and am finding many examples of clean data that are left with irrelevant punctuation at the beginning and end of the final string. Quotes and DoubleQuotes are fine, but leading/trailing dashes, commas, etc need to be removed

I've studied the answer at How can I remove all leading and trailing punctuation? , but am unable to find a way to accomplish the same in PHP.

- some text.                dash and period should be removed
"Some Other Text".          period should be removed
it's a matter of opinion    apostrophe should be kept
/ some more text?           Slash should be removed and question mark kept

In short,

  • Certain punctuation occurring BEFORE the first AlphaNumeric character must be removed
  • Certain punctuation occurring AFTER the last AlphaNumeric character must be removed

How can I accomplish this with PHP - the few examples I've found surpass my RegEx/JS abilites.

This is an answer without regex.

You can use the function trim (or a combination of ltrim / rtrim to specify all characters you want to remove. For your example:

$str = trim($str, " \t\n\r\0\x0B-.");

(As I suppose you also want to remove spacing and newlines at the begin/end, I left the default mask)

See also rtrim and ltrim if you don't want to remove the same charlist at the beginning and the end of your strings.

You can modify the pattern to include characters.

$array = array(
    '- some text.',
    '"Some Other Text".',
    'it\'s a matter of opinion',
    '/ some more text?'
);

foreach($array as $key => $string){
    $array[$key] = preg_replace(array(
        '/^[\.\-\/]*/',
        '/[\.\-\/]*$/'
    ), array('', ''), $string);
}

print_r($array);

If the punctuation could be more than one character, you could do this

function trimFormatting($str){ // trim 
    $osl = 0;
    $pat = '(<br>|,|\s+)';
    while($osl!==strlen($str)){
        $osl = strlen($str);
        $str =preg_replace('/^'.$pat.'|'.$pat.'$/i','',$str); 
    }
return $str;
}
echo trimFormatting('<BR>,<BR>Hello<BR>World<BR>, <BR>'); 

// will give "Hello<BR>World"

The routine checks for "<BR>" and "," and one or spaces ("\\s+"). The "|" being the OR operator used three times in the routine. It trims both at the start "^" and the end "$" at the same time. It keeps looping through this until no more matches are trimmed off (ie there is no further reduction in string length).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM