简体   繁体   English

改进工作正则表达式以匹配多行

[英]Improving a working regex to match multiple lines

I'm trying to match users from an old DOS dump so they can be migrated to something new.我正在尝试从旧的 DOS 转储中匹配用户,以便他们可以迁移到新的东西。 They begin with a % sign and end with a ] .它们以%符号开头并以]结尾。 Some on one line and others across many lines.有些在一条线上,有些在多条线上。

https://regex101.com/r/0h5ndW/1 https://regex101.com/r/0h5ndW/1

My Regex %([^\%]*)] works, but is there a better way to select each user beginning from % to the ] (including the % and ] ) so I can put them through preg_replace and manipulate them later?我的正则表达式%([^\%]*)]有效,但是有没有更好的方法来 select 每个用户从%] (包括%] )所以我可以将它们通过preg_replace并稍后操作它们?

I'm a little skeptical about the multi line part.我对多线部分有点怀疑。

Expected Output

%user:100 [     type=admin,     added=10/12/1997,     last-login:10/20/1997,     total-logins:45,     status:1 ]
%user:111 [     type=user,     added=10/12/1997,     last-login:10/27/1997,     total-logins:145,     status:1 ]
%user:112 [ type=viewer, added=10/12/1997,     last-login:10/23/1997,     total-logins:6,     status:1 ]
%user:113 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:2, status:1]
%user:114 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:1, status:1]
%user:115 [ type=viewer, added=10/12/1997,  last-login:10/12/1997,  total-logins:1, status:1 ]

Raw Data原始数据

%user:100 [
    type=admin,
    added=10/12/1997,
    last-login:10/20/1997,
    total-logins:45,
    status:1
]

%user:111 [
    type=user,
    added=10/12/1997,
    last-login:10/27/1997,
    total-logins:145,
    status:1
]

%user:112 [ type=viewer, added=10/12/1997,
    last-login:10/23/1997,
    total-logins:6,
    status:1
]

%user:113 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:2, status:1]

%user:114 [ type=viewer, added=10/12/1997,  last-login:10/14/1997,  total-logins:1, 
status:1]

%user:115 [ type=viewer, added=10/12/1997,  last-login:10/12/1997,  total-logins:1, 
status:1
]

You can use this regex for search:您可以使用此正则表达式进行搜索:

((?:^%|(?!\A)\G).*)\R(?=[^][]*])

and replace it with:并将其替换为:

$1

Updated RegEx Demo更新的 RegEx 演示

PHP Code: PHP 代码:

$repl = preg_replace('/((?:^%|(?!\A)\G).*)\R(?=[^][]*])/m', '$1', $str);

RegEx Details:正则表达式详细信息:

  • ( : Start capture group #1 ( : 开始捕获组 #1
    • (?:^%|(?!\A)\G) : Match % at line start or restart matching from end of previous match. (?:^%|(?!\A)\G) :在行开头匹配%或从前一个匹配的结尾重新开始匹配。 \G asserts position at the end of the previous match or the start of the string for the first match. \G断言 position 在前一个匹配的结尾或第一个匹配的字符串的开头。
    • .* : Match everything in same line .* :匹配同一行中的所有内容
  • ) : End capture group #1 ) : 结束捕获组 #1
  • \R : Match any kind of newline character \R : 匹配任何类型的换行符
  • (?=[^][]*]) : Make sure we have a ] ahead without matching [ or ] in between. (?=[^][]*]) :确保前面有一个] ,中间没有匹配[]

Another option is to use a variant of the pattern that you tried with a negated character class to match % and from an opening [ till closing ] .另一种选择是使用您尝试使用否定字符 class 的模式的变体来匹配%并从开始[直到结束]

Then per match remove the newlines.然后每场比赛删除换行符。

^%[^][]*\[[^][]*]$

Explanation解释

  • ^ Start of string ^字符串开头
  • %[^][]* Match % and 0+ times any char other than [ or ] %[^][]*匹配除[]以外的任何字符的%和 0+ 次
  • \[[^][]*] Match from [ till the closing ] \[[^][]*]匹配从[直到结束]
  • $ Assert end of string $断言字符串结束

Regex demo |正则表达式演示| Php demo Php 演示

For example例如

$result = preg_replace_callback("/^%[^][]*\[[^][]*]$/m", function($m) {
    return str_replace(PHP_EOL, "", $m[0]);
}, $data);

As an alternative to regex, this just splits the data using the ] .作为正则表达式的替代方法,这只是使用]拆分数据。 Then trims each line and replaces new lines (using PHP_EOL ) with a space...然后修剪每一行并用空格替换新行(使用PHP_EOL )......

$output = explode("]", $data);
array_pop($output);
array_walk($output, function(&$data) {
    $data = str_replace(PHP_EOL, " ", trim($data)."]");
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM