简体   繁体   中英

Regex to match attributes inside an HTML tag which may include php code

Generally I'd match HTML attributes with this regex

\w+=".*?"

but when the HTML contains PHP code it gets kind of dicy. Please consider the following tag:

<option value="<?php echo $img; ?>"<?php echo ($hpb[$i]['image_filename']==$img?' selected="selected"':''); ?>>
    <?php echo $img; ?>
</option>

the above regex will match the attribute selected="selected" which is determined inside PHP logic. Is there a way to match attributes which are not inside PHP tags while still matching the ones whose value may contain PHP logic? If not could I just remove the PHP code which isn't part of an attribute value?

EDIT: Here's what I have so far:

 \w+="(((.(?!<\?php))*?)|((.((?=<\?php).*?(?=\?>))*)*?))*"

Which basically means match a string which starts with a SPACE then greedily match alphanumeric characters followed by EQUALS sign followed by double quote and then match any of the following two while capturing as many characters as possible:

  1. A sequence of characters which does not contain the string <?php
  2. A sequence of characters containing the pattern <\\?php.*?\\?> or in other words greedily match the value part of the attribute with all of its PHP code All of that till a closing double quote is encountered...
/<\?php[\s\S]*?\?>|\s+(\w+)="([^"<]*(?:<\?php[\s\S]*?\?>[^<"]*)*)"/

This will match either a PHP code segment or a complete attribute="value" sequence in which the value may contain PHP code. After each match you can find out what you caught by checking the contents of the capturing groups. If it's a pure PHP segment you matched, all but group[0] will be empty; otherwise, group[1] will contain the attribute name and group[2] will contain the value.

The regex assumes < will appear inside an attribute value only as the beginning of a <?php tag. Of course that's not a syntactically valid assumption, but it's probably safe anyway. I can make the regex more precise if you need me to, but it will be also be much less readable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM