简体   繁体   English

通过 RegEx 解析带有不同引号的输入字符串

[英]Parsing an input-string with different quotes via RegEx

I need to convert an input-string with multipe words into a string-array via Powershell.我需要通过 Powershell 将带有多个单词的输入字符串转换为字符串数组。 Words can be separated by multiple spaces and/or linebreaks.单词可以由多个空格和/或换行符分隔。 Each word can be escaped by a single quote or a double quote.每个单词都可以用单引号或双引号转义。 Some words may start with a hashtag - in that case any quoting appears after that hashtag.有些词可能以主题标签开头 - 在这种情况下,任何引用都会出现在该主题标签之后。

Here a code sample of a possible input and the expected result:这是一个可能的输入和预期结果的代码示例:

$inputString = @"
  test1
  #custom1
  #"custom2"           #'custom3'
  #"custom ""four"""   #'custom ''five'''
  test2 "test3" 'test4'
"@

$result = @(
    'test1'
    '#custom1'
    '"#custom2"'
    "#'custom3'"
    '#"custom ""four"""'   
    "#'custom ''five'''"
    'test2' 
    '"test3"' 
    "'test4'"
)

Is there any solution to do this via a clever RegEx-expression?是否有任何解决方案可以通过巧妙的 RegEx 表达式来做到这一点? Or does someone have a parser-snippet/function to start with?或者有人有一个解析器片段/函数可以开始吗?

Assuming you fully control or implicitly trust the input string , you can use the following approach, which relies on Invoke-Expression , which should normally be avoided :假设您完全控制或隐式信任输入 string ,您可以使用以下方法,该方法依赖于Invoke-Expression通常应该避免

Assumptions made:做出的假设

  • # only appears at the start of embedded strings. #只出现在嵌入字符串的开头
  • No embedded string contains newlines itself.没有嵌入的字符串本身包含换行符。
$inputString = @"
  test1
  #custom1
  #"custom2"           #'custom3'
  #"custom ""four"""   #'custom ''five'''
  test2 "test3" 'test4'
"@

$embeddedStrings = Invoke-Expression @"
Write-Output $($inputString -replace '\r?\n', ' ' -replace '#', '`#')
"@

Caveat: The outer quoting around the individual strings is lost in the process and the embedded, escaped quotes are unescaped ;警告:单个字符串周围的外部引用在此过程中丢失嵌入的转义引号未转义 outputting $embeddedString yields:输出$embeddedString产生:

test1
#custom1
#custom2
#custom3
#custom "four"
#custom 'five'
test2
test3
test4

The approach relies on the fact that your embedded strings use PowerShell's quoting and quote-escaping rules;该方法依赖于您嵌入的字符串使用 PowerShell 的引用和引用转义规则这一事实; the only problems are the leading # characters, which are escaped as `# above.唯一的问题是前导#字符,它们被转义为上面的`# By replacing the embedded newlines ( \\r?\\n ) with spaces, the result can be passed as a list of positional arguments to Write-Output , inside a string that is then evaluated with Invoke-Expression .通过用空格替换嵌入的换行符 ( \\r?\\n ),结果可以作为位置参数列表传递给Write-Output ,在一个字符串中,然后用Invoke-Expression求值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM