[英]PowerShell - escaping fancy single and double quotes for regex and string replace
我正在使用由 Acrobat 创建的 HTML 文件,它没有使用正确的 HTML 实体来转义 Unicode 字符。 我需要在正则表达式模式中包含单引号和双引号,但是我在 escaping 所做的每一次尝试都在我的脚本中失败了......即使它适用于常规的 PowerShell Z21D6F40CFB511982E4424E0E250。
例如,此查找/替换不起作用:
$html = $html.Replace("`“", '“')
$html = $html.Replace("`”", '”')
$html = $html.Replace("`‘", '‘')
$html = $html.Replace("`’", '’')
...但是如果我闯入我的脚本并从调试提示符运行这些替换行之一,它确实有效。
编辑:这是我现在正在测试的标记片段:
<p style="padding-left: 5pt;text-indent: 17pt;line-height: 119%;text-align: justify;">To guide its readers the Hermetica makes use of the mystical astrological world-view that we have been discussing. It describes the creation of the world as a series of emanations, starting with the Light, who gave birth to a son called Logos. In the words of Hermes’s guide, Poimandres:</p><p style="padding-left: 24pt;text-indent: 0pt;line-height: 119%;text-align: justify;">“That Light,” he said, “is I, even Mind, the first God, who was before the watery substance which appeared out of the darkness; and the Logos which came forth the Light is son of God.”</p><p style="padding-left: 21pt;text-indent: 1pt;line-height: 119%;text-align: justify;">(Scott, Walter, translator, Hermetica: The Ancient Greek and Latin Writings Which Contain Religious or Philosophical Teachings Ascribed to Hermes Trismegistus, Boston: Shambhala: 1985, p. 117)</p>
如果$html
等于该字符串,那么我查找和替换字符的尝试似乎是徒劳的。
尝试使用 Unicode 值而不是反引用文字:
$html = $html.Replace("`u{201C}", '“')
$html = $html.Replace("`u{201D}", '”')
$html = $html.Replace("`u{2018}", '‘')
$html = $html.Replace("`u{2019}", '’')
显然,PowerShell 用非 BOM UTF-8 编码做了一些有趣的事情。 将 VSCode 设置为使用 BOM 将 PowerShell 脚本自动编码为 UTF-8允许 String.Replace function 按预期运行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.