简体   繁体   中英

Parsing multiline ini like file using PCRE regex

I have an ini like file where we have list of <key> = <value> items. What complicates things is that some values are multiline and can contain = character (tls private key). Example:

groupid = foo
location = westus
randomkey = fbae3700c34cb06c
resourcename = example4-resourcegroup
tls_private_key = -----BEGIN RSA PRIVATE KEY-----
//stuff
-----END RSA PRIVATE KEY-----

foo = 123
faa = 223

What I have so far for pattern is this /^(.*?)\ \=\ (.*[^=]*)$/m and it works for all keys except the tls_private_key because it contains = so it only fetches partial value.

Any suggestions?

You might match all the values over mulitple lines, asserting that the next line does not contain a space equals sign space:

^(.*?) = (.*(?:\R(?!.*? = ).*)*)

Regex demo

If the key can not have spaces:

^([^\s=]+)\h+=\h+(.*(?:\R(?![^\s=]+\h+=\h+).*)*)$

Explanation

  • ^ Start of string
  • ([^\s=]+) Capture group 1 , match 1+ chars other than = or a whitespace char
  • \h+=\h+ Match an = between spaces
  • ( Capture group 2
    • .* Match the whole line
    • (?:\R(?.[^\s=]+\h+=\h+).*)* Repeat all following lines that do not contain a space = space
  • ) Close capture group 2
  • $ End of string

Regex demo

You may use this regex with a lookahead:

^\h*(?<key>[\w-]+)\h*=\h*(?<value>[\s\S]*?)(?=\R\h*[\w-]+\h*=|\z)

RegEx Demo

RegEx Details:

  • ^ Start a line
  • \h* : 0 or more horizontal whitespaces
  • (?<key>[\w-]+) : Group key that matches 1+ word characters or hyphens
  • \h* : 0 or more horizontal whitespaces
  • = : Match a =
  • \h* : 0 or more horizontal whitespaces
  • (?<value>[\s\S]*?) : Group value that matches 0 or more of any characters including newlines
  • (?=\R\h*[\w-]+\h*=|\z) : Lookahead to assert that at next position we have a line break followed by key and = or there is end of input

Another variation:

(?sm)^([^=\n]*)\s=\s(.*?)(?=\n[^=\n]*\s=\s|\z)

See proof

Explanation

--------------------------------------------------------------------------------
  (?ms)                    set flags for this block (with ^ and $
                           matching start and end of line) (with .
                           matching \n) (case-sensitive) (matching
                           whitespace and # normally)
--------------------------------------------------------------------------------
  ^                        the beginning of a "line"
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^=\n]*                  any character except: '=', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  =                        '='
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times (matching
                             the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \n                       '\n' (newline)
--------------------------------------------------------------------------------
    [^=\n]*                  any character except: '=', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \z                       the end of the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM