简体   繁体   中英

PHP regex href with title not set

I'm trying to create a function in PHP that would search in a string for all a href occurences and if title is not set it should replace it with the text value between > text </a> I don't know what is the best way to do it, thinking about something like:

$s = preg_replace('/<  a[^>]*?href=[\'"](.*?)[\'"][^>]*?title=[\'"](.*?)[\'"][^>]*?>(.*?)<\/a>/si','<  a href="$1" title="$2">$3</a>',$s);

How can I check in the regex to see if $2 is set and if it isn't replace it with $3, also $3 can be something like img src="..." alt="..." and in this case I would like to get the value of alt.

First of all I would like to know if this can be done in PHP and how, but any help would be apreciated.

Maybe presume it is not going to be set and look for title='' only:

$preg_replace("/<a[^>]*?href=[\'\"](.*?)[\'\"][^>]*?title=''>(.*?)<\/a>/i","<a href='$1' title='$2'>$2</a>","<a href='http://google.com' title=''>Google</a>");

Output:

<a href='http://google.com' title='Google'>Google</a>

Good luck.

EDIT

Sorry, not too sure what you mean by:

also $3 can be something like img src="..." alt="..." and in this case I would like to get the value of alt.

Isn't $3 in your example the link text?

The uninformative link is somehwat fitting here. That's not easily doable with regexpressions. You for example cannot use a (?!\\4) negative assertion with forward backreference to compare the title= against the <img alt= attribute (which adds enough difficult for extraction already).

At the very least you will have to use preg_replace_callback and handle the replacement in a separate function. There it's easier to break out the attributes and compare alt= against title=.

If you aren't using this for output rewriting, then make the task simpler by not using regexpressions. This is performance-wise not the better choice, but easy to do with eg phpQuery or QueryPath :

$qp = qp($html);
foreach ($qp->find("a") as $a) {
    $title = $a->attr("title");
    $alt = $a->find("img")->attr("$title");
    if (!$title) { $a->attr("title", $alt); }
}
$html = $qp->top()->writeHtml();

(The same can be done, only with more elaborate code, using DOMDocument...)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM