I'm trying to match a string in an HTTP page between ><
.
I'm having trouble with the first >
as it also matches subsequent chars. eg. in this example
<a href="https://stackoverflow.com" class="-logo js-gps-track"
data-gps-track="top_nav.click({is_current:false, location:3, destination:8})">
<span class="-img _glyph">Stack Overflow</span>
</a>
I would only want to match Stack Overflow
. I've currently got \\>([^\\>].*Stack Overflow.*)\\<
but that matches everything after the first >
ie
><span class="-img _glyph">Stack Overflow<
Any help would be great
It'd probably be more elegant to use DOMParser, and take the textContent
of .-img._glyph
:
const str = `<a href="https://stackoverflow.com" class="-logo js-gps-track" data-gps-track="top_nav.click({is_current:false, location:3, destination:8})"> <span class="-img _glyph">Stack Overflow</span> </a>`; console.log( new DOMParser().parseFromString(str, 'text/html').querySelector('.-img._glyph').textContent );
If you had to use regex, instead of repeating .
(which matches anything), repeat [^<>]
(which matches anything which isn't a <
or >
) on either side of the Stack Overflow
part, while looking ahead and behind for <
and >
:
(?<=>)[^<>]*Stack Overflow[^<>]*(?=<)
(If you can't use lookbehind, match the initial >
and capture everything afterwards, then extract the capture group)
Try using lookbehind and lookahead assertions , as in this regex: (?<=>)Stack Overflow(?=<)
const text = `<a href="https://stackoverflow.com" class="-logo js-gps-track" data-gps-track="top_nav.click({is_current:false, location:3, destination:8})"> <span class="-img _glyph">Stack Overflow</span> </a>`; const regex = /(?<=>)Stack Overflow(?=<)/g; const found = text.match(regex); console.log(found);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.