I'm trying to extract hashtags in an HTML text with the regular expression #([a-z0-9_]+)
, but with troubles in HTML attributes.
For example in the HTML text:
hola que tal with #hash1.
hola que tal with #hash2
y <a href="hola.que.tal#hash3"> para #hash4. </a>
I want to recover "hash1", "hash2" and "hash4" but not "hash3".
I tried to resolve it with lookarounds, with the following expression:
(?<!<)#([a-z0-9_]+)(?!.*?>)
but without success.
How I can do it with a single regular expression ?
This should work
/#[a-z0-9_]+(?![^<]*>)/
See http://www.regexpal.com/?fam=95144
What the negative lookahead does is makes sure that there is a <
between the hashtag and the next >
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.