I'm trying to write a Javascript HTML/php parser which would extract all opening tags from a HTML/php source and return the type of tag and attributes with their values while at the same time monitoring whether the values/attributes should be evaluated from static text or php variables. The problem is when I try to compose the Javascript RegExp pattern and more specifically certain rare cases. The RegExp I was able to come up with either involve negative lookbehind (to cope with the closing php tag - that is to match a closing bracket that is not preceded by a question mark) or fails in certain cases. The lookbehind version looks like:
<[a-zA-Z]+.*?(?<!\?)>
...and works perfect except for my case which must avoid using lookbehind. A more Javascript friendly version would be:
<[a-zA-Z]+((.(?!</)(?!<[a-zA-Z]+))*)?>
...which works except in this case:
<option value="<?php echo $img; ?>"<?php echo ($hpb[$i]['image_filename']==$img?' selected="selected"':''); ?>><?php echo $img; ?></option>
Am I approaching the problem completely messed up or is the lookbehind really necessary in my case? Any help is greatly appreciated.
Just make sure the last letter before the '>' is not a ?, using [^?]. No lookaheads or -behinds needed.
<[a-zA-Z](.*?[^?])?>
the parentheses and the last ? is to also match tags like <b>
.
EDIT The solution didn't work for single character tags without attributes. So here is one that does:
<[a-zA-Z]+(>|.*?[^?]>)
更简单的答案是<[^ / ^>] +>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.