简体   繁体   中英

Regex pattern to match hashtag, but not in HTML attributes

I'm trying to extract hashtags in an HTML text with the regular expression #([a-z0-9_]+) , but with troubles in HTML attributes.

For example in the HTML text:

hola que tal with #hash1.
hola que tal with #hash2

y <a href="hola.que.tal#hash3"> para #hash4. </a>

I want to recover "hash1", "hash2" and "hash4" but not "hash3".

I tried to resolve it with lookarounds, with the following expression:

(?<!<)#([a-z0-9_]+)(?!.*?>)

but without success.

How I can do it with a single regular expression ?

This should work

/#[a-z0-9_]+(?![^<]*>)/

See http://www.regexpal.com/?fam=95144

What the negative lookahead does is makes sure that there is a < between the hashtag and the next > .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM