简体   繁体   中英

PHP: Remove certain character from string excluding/ignoring URL

I have a string as below

String

<span class="post-excerpt"> - <a href="./posts/the-post-title">17 posts</a> - Li Europan lingues es membres del sam familie. Lor separat existentie es un myth. Por scientie, musica, sport etc, litot Europa usa li sam vocabular. Li lingues differe solmen in li grammatica, li pronunciation e li plu commun vocabules. Omnicos directe al desirabilite de un nov lingua franca: On refusa continuar payar custosi traductores. At solmen va esser necessi far uniform grammatica, pronunc</span>

Now I want to remove - from string (not from the URL within the string)

I have tried to use str_replace() but than it is removing from URL also and that resulting broken links of course.

Anyone can please help me to remove - from the string but not from the URL

Assuming that the string will always be in that format, you can alter your str_replace to be more specific, thus ignoring the - in the URLs:

$newString = str_replace('> - <', '><', $oldString);

Like I said, ensure the format is always the same, ie > - <

You could use DOMDocument , which will parse your HTML. This means that you can use str_replace only on the contents of your elements, rather than risk modifying their attributes as well.

It looks a lot more long-winded but it's also a lot safer and will still continue to work if the format of your HTML changes slightly in the future:

$html = '<span class="post-excerpt"> - <a href="./posts/the-post-title">17 posts</a> - Li Europan lingues es membres del sam familie. Lor separat existentie es un myth. Por scientie, musica, sport etc, litot Europa usa li sam vocabular. Li lingues differe solmen in li grammatica, li pronunciation e li plu commun vocabules. Omnicos directe al desirabilite de un nov lingua franca: On refusa continuar payar custosi traductores. At solmen va esser necessi far uniform grammatica, pronunc</span>';

$doc = new DOMDocument();
$doc->loadHTML($html);

// DOMDocument creates a valid HTML document, adding a doctype, <html> and <body> tags
// The following two lines remove them
// http://stackoverflow.com/a/6953808/2088135
$doc->removeChild($doc->firstChild);
$doc->replaceChild($doc->firstChild->firstChild->firstChild, $doc->firstChild);

$span = $doc->getElementsByTagName('span')->item(0);    
foreach ($span->childNodes as $node) {
    $node->nodeValue = str_replace(' - ', '', $node->nodeValue);
}

echo $doc->saveHTML();

Output:

<span class="post-excerpt"><a href="./posts/the-post-title">17 posts</a>Li Europan lingues es membres del sam familie. Lor separat existentie es un myth. Por scientie, musica, sport etc, litot Europa usa li sam vocabular. Li lingues differe solmen in li grammatica, li pronunciation e li plu commun vocabules. Omnicos directe al desirabilite de un nov lingua franca: On refusa continuar payar custosi traductores. At solmen va esser necessi far uniform grammatica, pronunc</span>

Not elegant, but working and universal method:

1) replace all - occurences inside href attribute with some predefined "word" - combination of characters which not include - . This can be done by preg_replace_callback .

2) Do plain string replacement by str_replace :

$result = str_replace('-', '', $source);

3) Do backward replacement of all "word" occurences with - character.

$newString = str_replace('> - <', '><', $oldString);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM