简体   繁体   中英

complex -replace with Regex in Powershell c#

Following Regex:

(?<=href(\s+)?=(\s+)?")(?!(\s+)?http)(?!//).+(?=")

Works as expected with test articles:

href="//www.google-analytics.com/analytics.js">
href="https://www.google-analytics.com/analytics.js">
href="index.html">
href="..\index.html">
href="main.css">
href="..\assets\main.css">
href = " ..\assets\main.css ">

As you may see here: https://t.co/PC0U9br3vn

However:

[$string] $string = Get-Content sample.txt

[$string] $regex = '(?<=href(\s+)?=(\s+)?")(?!(\s+)?http)(?!(\s+)?//)(?!(\s+)?mailto).+(?=")'

$newString = $string -replace $regex, "..\$&"

$string
$newString

Produces the following output:

//www.google-analytics.com/analytics.js">  href=" https://www.google-analytics.com/analytics.js">  href="index.html">  href="..\index.html">  href="  main.css">  href="..\assets\main.css">  href = " ..\assets\main.css ">  href = "mailto://email@domain ">  href = "..\..\..\assets\main.css"
//www.google-analytics.com/analytics.js">  href=" https://www.google-analytics.com/analytics.js">  href="..\index.html">  href="..\index.html">  href="  main.css">  href="..\assets\main.css">  href = " ..\assets\main.css ">  href = "mailto://email@domain ">  href = "..\..\..\assets\main.css"

As only the first article is being operated on.

The same script is working elsewhere where the replace string does not utilise regex and is a simple string.

Input is of the wrong type:

[$string] $string = Get-Content sample.txt

However and array of strings works:

[$string[]] $string = Get-Content sample.txt

All you need is a negated character class [^"]+ ( see this post of mine where I explain how \\[^"\\]+ works ). However, also note that (\\s+)? is the same as \\s* . No need to overstuff your regex with capturing groups if you are not planning to use them.

Use

(?<=href\s*=\s*")(?!\s*http)(?!//)[^"]+

See regex demo

Here is what it matches:

  • (?<=href\\s*=\\s*") - if there is href followed by 0 or more whitespace symbols, followed with = and then again 0 or more whitespace before...
  • (?!\\s*http) - and if there is no 0 or more whitespace followed by http right after the current position, and...
  • (?!//) - if there is no // right after the current position...
  • [^"]+ - match 1 or more characters other than " .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM