Following Regex:
(?<=href(\s+)?=(\s+)?")(?!(\s+)?http)(?!//).+(?=")
Works as expected with test articles:
href="//www.google-analytics.com/analytics.js">
href="https://www.google-analytics.com/analytics.js">
href="index.html">
href="..\index.html">
href="main.css">
href="..\assets\main.css">
href = " ..\assets\main.css ">
As you may see here: https://t.co/PC0U9br3vn
However:
[$string] $string = Get-Content sample.txt
[$string] $regex = '(?<=href(\s+)?=(\s+)?")(?!(\s+)?http)(?!(\s+)?//)(?!(\s+)?mailto).+(?=")'
$newString = $string -replace $regex, "..\$&"
$string
$newString
Produces the following output:
//www.google-analytics.com/analytics.js"> href=" https://www.google-analytics.com/analytics.js"> href="index.html"> href="..\index.html"> href=" main.css"> href="..\assets\main.css"> href = " ..\assets\main.css "> href = "mailto://email@domain "> href = "..\..\..\assets\main.css"
//www.google-analytics.com/analytics.js"> href=" https://www.google-analytics.com/analytics.js"> href="..\index.html"> href="..\index.html"> href=" main.css"> href="..\assets\main.css"> href = " ..\assets\main.css "> href = "mailto://email@domain "> href = "..\..\..\assets\main.css"
As only the first article is being operated on.
The same script is working elsewhere where the replace string does not utilise regex and is a simple string.
Input is of the wrong type:
[$string] $string = Get-Content sample.txt
However and array of strings works:
[$string[]] $string = Get-Content sample.txt
All you need is a negated character class [^"]+
( see this post of mine where I explain how \\[^"\\]+
works ). However, also note that (\\s+)?
is the same as \\s*
. No need to overstuff your regex with capturing groups if you are not planning to use them.
Use
(?<=href\s*=\s*")(?!\s*http)(?!//)[^"]+
See regex demo
Here is what it matches:
(?<=href\\s*=\\s*")
- if there is href
followed by 0 or more whitespace symbols, followed with =
and then again 0 or more whitespace before... (?!\\s*http)
- and if there is no 0 or more whitespace followed by http
right after the current position, and... (?!//)
- if there is no //
right after the current position... [^"]+
- match 1 or more characters other than "
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.