简体   繁体   中英

Improving a working regex to match multiple lines

I'm trying to match users from an old DOS dump so they can be migrated to something new. They begin with a % sign and end with a ] . Some on one line and others across many lines.

https://regex101.com/r/0h5ndW/1

My Regex %([^\%]*)] works, but is there a better way to select each user beginning from % to the ] (including the % and ] ) so I can put them through preg_replace and manipulate them later?

I'm a little skeptical about the multi line part.

Expected Output

%user:100 [     type=admin,     added=10/12/1997,     last-login:10/20/1997,     total-logins:45,     status:1 ]
%user:111 [     type=user,     added=10/12/1997,     last-login:10/27/1997,     total-logins:145,     status:1 ]
%user:112 [ type=viewer, added=10/12/1997,     last-login:10/23/1997,     total-logins:6,     status:1 ]
%user:113 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:2, status:1]
%user:114 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:1, status:1]
%user:115 [ type=viewer, added=10/12/1997,  last-login:10/12/1997,  total-logins:1, status:1 ]

Raw Data

%user:100 [
    type=admin,
    added=10/12/1997,
    last-login:10/20/1997,
    total-logins:45,
    status:1
]

%user:111 [
    type=user,
    added=10/12/1997,
    last-login:10/27/1997,
    total-logins:145,
    status:1
]

%user:112 [ type=viewer, added=10/12/1997,
    last-login:10/23/1997,
    total-logins:6,
    status:1
]

%user:113 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:2, status:1]

%user:114 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:1, 
status:1]

%user:115 [ type=viewer, added=10/12/1997,  last-login:10/12/1997,  total-logins:1, 
status:1
]

You can use this regex for search:

((?:^%|(?!\A)\G).*)\R(?=[^][]*])

and replace it with:

$1

Updated RegEx Demo

PHP Code:

$repl = preg_replace('/((?:^%|(?!\A)\G).*)\R(?=[^][]*])/m', '$1', $str);

RegEx Details:

  • ( : Start capture group #1
    • (?:^%|(?!\A)\G) : Match % at line start or restart matching from end of previous match. \G asserts position at the end of the previous match or the start of the string for the first match.
    • .* : Match everything in same line
  • ) : End capture group #1
  • \R : Match any kind of newline character
  • (?=[^][]*]) : Make sure we have a ] ahead without matching [ or ] in between.

Another option is to use a variant of the pattern that you tried with a negated character class to match % and from an opening [ till closing ] .

Then per match remove the newlines.

^%[^][]*\[[^][]*]$

Explanation

  • ^ Start of string
  • %[^][]* Match % and 0+ times any char other than [ or ]
  • \[[^][]*] Match from [ till the closing ]
  • $ Assert end of string

Regex demo | Php demo

For example

$result = preg_replace_callback("/^%[^][]*\[[^][]*]$/m", function($m) {
    return str_replace(PHP_EOL, "", $m[0]);
}, $data);

As an alternative to regex, this just splits the data using the ] . Then trims each line and replaces new lines (using PHP_EOL ) with a space...

$output = explode("]", $data);
array_pop($output);
array_walk($output, function(&$data) {
    $data = str_replace(PHP_EOL, " ", trim($data)."]");
});

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM