改进工作正则表达式以匹配多行

Question

I'm trying to match users from an old DOS dump so they can be migrated to something new.我正在尝试从旧的 DOS 转储中匹配用户，以便他们可以迁移到新的东西。 They begin with a % sign and end with a ] .它们以%符号开头并以]结尾。 Some on one line and others across many lines.有些在一条线上，有些在多条线上。

https://regex101.com/r/0h5ndW/1 https://regex101.com/r/0h5ndW/1

My Regex %([^\%]*)] works, but is there a better way to select each user beginning from % to the ] (including the % and ] ) so I can put them through preg_replace and manipulate them later?我的正则表达式%([^\%]*)]有效，但是有没有更好的方法来 select 每个用户从%到] （包括%和] ）所以我可以将它们通过preg_replace并稍后操作它们？

I'm a little skeptical about the multi line part.我对多线部分有点怀疑。

Expected Output

%user:100 [     type=admin,     added=10/12/1997,     last-login:10/20/1997,     total-logins:45,     status:1 ]
%user:111 [     type=user,     added=10/12/1997,     last-login:10/27/1997,     total-logins:145,     status:1 ]
%user:112 [ type=viewer, added=10/12/1997,     last-login:10/23/1997,     total-logins:6,     status:1 ]
%user:113 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:2, status:1]
%user:114 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:1, status:1]
%user:115 [ type=viewer, added=10/12/1997,  last-login:10/12/1997,  total-logins:1, status:1 ]

Raw Data原始数据

%user:100 [
    type=admin,
    added=10/12/1997,
    last-login:10/20/1997,
    total-logins:45,
    status:1
]

%user:111 [
    type=user,
    added=10/12/1997,
    last-login:10/27/1997,
    total-logins:145,
    status:1
]

%user:112 [ type=viewer, added=10/12/1997,
    last-login:10/23/1997,
    total-logins:6,
    status:1
]

%user:113 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:2, status:1]

%user:114 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:1, 
status:1]

%user:115 [ type=viewer, added=10/12/1997,  last-login:10/12/1997,  total-logins:1, 
status:1
]

Answer 1

You can use this regex for search:您可以使用此正则表达式进行搜索：

((?:^%|(?!\A)\G).*)\R(?=[^][]*])

and replace it with:并将其替换为：

$1

Updated RegEx Demo更新的 RegEx 演示

PHP Code: PHP 代码：

$repl = preg_replace('/((?:^%|(?!\A)\G).*)\R(?=[^][]*])/m', '$1', $str);

RegEx Details:正则表达式详细信息：

( : Start capture group #1 ( : 开始捕获组 #1
- (?:^%|(?!\A)\G) : Match % at line start or restart matching from end of previous match. (?:^%|(?!\A)\G) ：在行开头匹配%或从前一个匹配的结尾重新开始匹配。 \G asserts position at the end of the previous match or the start of the string for the first match. \G断言 position 在前一个匹配的结尾或第一个匹配的字符串的开头。
- .* : Match everything in same line .* ：匹配同一行中的所有内容
) : End capture group #1 ) : 结束捕获组 #1
\R : Match any kind of newline character \R : 匹配任何类型的换行符
(?=[^][]*]) : Make sure we have a ] ahead without matching [ or ] in between. (?=[^][]*]) ：确保前面有一个] ，中间没有匹配[或] 。

Answer 2

Another option is to use a variant of the pattern that you tried with a negated character class to match % and from an opening [ till closing ] .另一种选择是使用您尝试使用否定字符 class 的模式的变体来匹配%并从开始[直到结束] 。

Then per match remove the newlines.然后每场比赛删除换行符。

^%[^][]*\[[^][]*]$

Explanation解释

^ Start of string ^字符串开头
%[^][]* Match % and 0+ times any char other than [ or ] %[^][]*匹配除[或]以外的任何字符的%和 0+ 次
\[[^][]*] Match from [ till the closing ] \[[^][]*]匹配从[直到结束]
$ Assert end of string $断言字符串结束

Regex demo |正则表达式演示| Php demo Php 演示

For example例如

$result = preg_replace_callback("/^%[^][]*\[[^][]*]$/m", function($m) {
    return str_replace(PHP_EOL, "", $m[0]);
}, $data);

Answer 3

As an alternative to regex, this just splits the data using the ] .作为正则表达式的替代方法，这只是使用]拆分数据。 Then trims each line and replaces new lines (using PHP_EOL ) with a space...然后修剪每一行并用空格替换新行（使用PHP_EOL ）......

$output = explode("]", $data);
array_pop($output);
array_walk($output, function(&$data) {
    $data = str_replace(PHP_EOL, " ", trim($data)."]");
});

改进工作正则表达式以匹配多行

问题描述

3 个解决方案

解决方案1
2 已采纳 2020-12-12 12:17:26

解决方案2
2 2020-12-12 13:59:21

解决方案3
1 2020-12-12 12:36:01

改进工作正则表达式以匹配多行

问题描述

3 个解决方案

解决方案1 2 已采纳 2020-12-12 12:17:26

解决方案2 2 2020-12-12 13:59:21

解决方案3 1 2020-12-12 12:36:01

解决方案1
2 已采纳 2020-12-12 12:17:26

解决方案2
2 2020-12-12 13:59:21

解决方案3
1 2020-12-12 12:36:01