Parsing multiline ini like file using PCRE regex

Question

I have an ini like file where we have list of <key> = <value> items. What complicates things is that some values are multiline and can contain = character (tls private key). Example:

groupid = foo
location = westus
randomkey = fbae3700c34cb06c
resourcename = example4-resourcegroup
tls_private_key = -----BEGIN RSA PRIVATE KEY-----
//stuff
-----END RSA PRIVATE KEY-----

foo = 123
faa = 223

What I have so far for pattern is this /^(.*?)\ \=\ (.*[^=]*)$/m and it works for all keys except the tls_private_key because it contains = so it only fetches partial value.

Any suggestions?

Answer 1

You might match all the values over mulitple lines, asserting that the next line does not contain a space equals sign space:

^(.*?) = (.*(?:\R(?!.*? = ).*)*)

Regex demo

If the key can not have spaces:

^([^\s=]+)\h+=\h+(.*(?:\R(?![^\s=]+\h+=\h+).*)*)$

Explanation

^ Start of string
([^\s=]+) Capture group 1 , match 1+ chars other than = or a whitespace char
\h+=\h+ Match an = between spaces
( Capture group 2
- .* Match the whole line
- (?:\R(?.[^\s=]+\h+=\h+).*)* Repeat all following lines that do not contain a space = space
) Close capture group 2
$ End of string

Regex demo

Answer 2

You may use this regex with a lookahead:

^\h*(?<key>[\w-]+)\h*=\h*(?<value>[\s\S]*?)(?=\R\h*[\w-]+\h*=|\z)

RegEx Demo

RegEx Details:

^ Start a line
\h* : 0 or more horizontal whitespaces
(?<key>[\w-]+) : Group key that matches 1+ word characters or hyphens
\h* : 0 or more horizontal whitespaces
= : Match a =
\h* : 0 or more horizontal whitespaces
(?<value>[\s\S]*?) : Group value that matches 0 or more of any characters including newlines
(?=\R\h*[\w-]+\h*=|\z) : Lookahead to assert that at next position we have a line break followed by key and = or there is end of input

Answer 3

Another variation:

(?sm)^([^=\n]*)\s=\s(.*?)(?=\n[^=\n]*\s=\s|\z)

See proof

Explanation

--------------------------------------------------------------------------------
  (?ms)                    set flags for this block (with ^ and $
                           matching start and end of line) (with .
                           matching \n) (case-sensitive) (matching
                           whitespace and # normally)
--------------------------------------------------------------------------------
  ^                        the beginning of a "line"
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^=\n]*                  any character except: '=', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  =                        '='
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    .*?                      any character (0 or more times (matching
                             the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    \n                       '\n' (newline)
--------------------------------------------------------------------------------
    [^=\n]*                  any character except: '=', '\n'
                             (newline) (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    =                        '='
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \z                       the end of the string
--------------------------------------------------------------------------------
  )                        end of look-ahead

Parsing multiline ini like file using PCRE regex

Question

3 answers

solution1
5 ACCPTED 2021-01-02 10:08:27

solution2
4 2021-01-02 10:08:16

solution3
0 2021-01-02 20:37:12

Parsing multiline ini like file using PCRE regex

Question

3 answers

solution1 5 ACCPTED 2021-01-02 10:08:27

solution2 4 2021-01-02 10:08:16

solution3 0 2021-01-02 20:37:12

solution1
5 ACCPTED 2021-01-02 10:08:27

solution2
4 2021-01-02 10:08:16

solution3
0 2021-01-02 20:37:12