简体   繁体   中英

Regex for matching script tags that contain a specific string

In Node.js, I am trying to pull specific script tags from an HTML file. The file has many script tags, but only some of them contain a push() method call. I only want to match those. I have linked a super simplified example Regexr that is close. I need this to not match the first three lines as part of the first match though.

The current regex: <script\\b[^>]*>([\\n\\r\\s\\S]*?)push([\\n\\r\\s\\S]*?)<\\/script>

Example: https://regexr.com/3qqt8

Sounds like a cleaning job. Building on your existing code, I suggest capture and disregard script-blocks without the push-keyword in an alternation, and then just work with the values stored in the capture groups. This could look like this:

<script\b[^>]*>(?:(?!push)[\s\S])*?<\/script>|<script\b[^>]*>([\s\S]*?)push([\s\S]*?)<\/script>

Demo

You may want to use a stronger definition of your keyword, eg \\.push\\( to avoid false positives.

 var regex = /<skript\\b[^>]*>(?:(?!push)[\\s\\S])*?<\\/skript>|<skript\\b[^>]*>([\\s\\S]*?)push([\\s\\S]*?)<\\/skript>/g; var str = `<skript> function() {} </skript> <div></div> <skript> someFuncCall(); array.push(); </skript> <skript> otherFuncCall(); array.push(); </skript> `; let m; while ((m = regex.exec(str)) !== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex.lastIndex++; } if(m[1] && m[2]) // if group 1 & 2 exists console.log(`Found: ${m[1]}push${m[2]}`); } 

PS: It looks like script-tags are filtered out in snippets, thus I've replaced them with skript -tags.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM