簡體   English   中英

PowerShell - escaping 花式單引號和雙引號,用於正則表達式和字符串替換

[英]PowerShell - escaping fancy single and double quotes for regex and string replace

我正在使用由 Acrobat 創建的 HTML 文件,它沒有使用正確的 HTML 實體來轉義 Unicode 字符。 我需要在正則表達式模式中包含單引號和雙引號,但是我在 escaping 所做的每一次嘗試都在我的腳本中失敗了......即使它適用於常規的 PowerShell Z21D6F40CFB511982E4424E0E250。

例如,此查找/替換不起作用:

    $html = $html.Replace("`“", '“')
    $html = $html.Replace("`”", '”')
    $html = $html.Replace("`‘", '‘')
    $html = $html.Replace("`’", '’')

...但是如果我闖入我的腳本並從調試提示符運行這些替換行之一,它確實有效。

編輯:這是我現在正在測試的標記片段:

<p style="padding-left: 5pt;text-indent: 17pt;line-height: 119%;text-align: justify;">To guide its readers the Hermetica makes use of the mystical astrological world-view that we have been discussing. It describes the creation of the world as a series of emanations, starting with the Light, who gave birth to a son called Logos. In the words of Hermes’s guide, Poimandres:</p><p style="padding-left: 24pt;text-indent: 0pt;line-height: 119%;text-align: justify;">“That Light,” he said, “is I, even Mind, the first God, who was before the watery substance which appeared out of the darkness; and the Logos which came forth the Light is son of God.”</p><p style="padding-left: 21pt;text-indent: 1pt;line-height: 119%;text-align: justify;">(Scott, Walter, translator, Hermetica: The Ancient Greek and Latin Writings Which Contain Religious or Philosophical Teachings Ascribed to Hermes Trismegistus, Boston: Shambhala: 1985, p. 117)</p>

如果$html等於該字符串,那么我查找和替換字符的嘗試似乎是徒勞的。

嘗試使用 Unicode 值而不是反引用文字:

    $html = $html.Replace("`u{201C}", '&ldquo;')
    $html = $html.Replace("`u{201D}", '&rdquo;')
    $html = $html.Replace("`u{2018}", '&lsquo;')
    $html = $html.Replace("`u{2019}", '&rsquo;')

顯然,PowerShell 用非 BOM UTF-8 編碼做了一些有趣的事情。 將 VSCode 設置為使用 BOM 將 PowerShell 腳本自動編碼為 UTF-8允許 String.Replace function 按預期運行。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM