正则表达式匹配从换行符到括号的所有内容以及搜索词

Question

We're trying to parse information that's been output from an DOS based accounting software from the 90s, so we can convert and upload it to a newer system.我们正在尝试解析 90 年代基于 DOS 的会计软件输出的信息，因此我们可以将其转换并上传到更新的系统。 It's mostly information pertaining to each accounting entry and it's output with random tabs, line breaks etc. like this:它主要是与每个会计分录有关的信息，它以随机制表符、换行符等形式输出，如下所示：

#Ch. No. 209488 #Rt. Date 12-09-1997 #Bank: Citibank (R:2379;L:28)

#Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997 #Bank: Citibank (R:2432;
L:28)

#Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997
#Bank: Citibank (R:2432;
L:28
)

#Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997
        #Bank: Citibank (R:2432;
    L:28
)

However, whats clear is that the information for each entry starts on a new line and ends with a )但是，很清楚的是，每个条目的信息都从新行开始并以)结尾

How can a regex that'll start looking for a term in that line all the way upto a ) be written?如何编写将开始在该行中一直到 a )查找术语的正则表达式？

For example in the data above, we're looking for the string Dr using preg_match_all('/^.*\\b(?:Dr)\\b.*$/m', $dos, $matches) and it matches as follows:例如在上面的数据中，我们正在使用preg_match_all('/^.*\\b(?:Dr)\\b.*$/m', $dos, $matches)查找字符串Dr ，它匹配如下：

Array
(
    [0] => Array
        (
            [0] => #Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997 #Bank: Citibank (R:2432;
            [1] => #Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997
            [2] => #Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997
        )

)

You can see from the second result in the array that it's omitted #Bank: Citibank (R:2432; L:28) since it's on a separate line, but that data is still part of the line above it.您可以从数组中的第二个结果中看到，它被省略了#Bank: Citibank (R:2432; L:28)因为它位于单独的行上，但该数据仍然是其上方行的一部分。

How can the regex we're using be modified to match upto the next ) regardless if it's on the same line or next line or even few more lines below?我们正在使用的正则表达式如何修改以匹配下一个)无论它是在同一行还是下一行，甚至是下面的几行？ So the result will be:所以结果将是：

Array
(
    [0] => Array
        (
            [0] => #Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997 #Bank: Citibank (R:2432;L:28)
            [1] => #Ch. No. 759263 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997 #Bank: Citibank (R:2432;L:28)
            [2] => #Ch. No. 395159 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997 #Bank: Citibank (R:2432;L:28)
        )

)

Answer 1

You could use a negated character class [^ to match any char except the parenthesis which will also match a newline.您可以使用否定字符类[^来匹配除括号之外的任何字符，括号也将匹配换行符。

After the match you can replace all whitespace chars with a single space.匹配后，您可以用一个空格替换所有空白字符。

^.*\bDr\b[^()]*\([^()]+\)

That will match那会匹配

^ Start of string ^字符串开始
.*\\bDr\\b Match 0+ times any char except a newline and then match Dr between word boundaries (Or match #Dr\\b if it always start with # ) .*\\bDr\\b匹配 0+ 次除换行符以外的任何字符，然后匹配单词边界之间的 Dr （或者匹配#Dr\\b如果它总是以#开头）
[^()]* Match 0+ times any char except parenthesis [^()]*匹配 0+ 次除括号外的任何字符
\\( Match ( \\(匹配(
[^()]+ Match 1+ times any char except parenthesis (if there has to be at least a single char not being ( ) in between [^()]+匹配 1+ 次除括号之外的任何字符（如果必须至少有一个字符不是( )
\\) Match ) \\)匹配)

Regex demo |正则表达式演示| Php demo php 演示

For example例如

$re = '/^.*\bDr\b[^()]*\([^()]+\)/m';
$str = '#Ch. No. 209488 #Rt. Date 12-09-1997 #Bank: Citibank (R:2379;L:28)

#Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997 #Bank: Citibank (R:2432;
L:28)

#Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997
#Bank: Citibank (R:2432;
L:28
)

#Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997
        #Bank: Citibank (R:2432;
    L:28
)';

$result = preg_match_all($re, $str, $matches);
$result = array_map(function($x) {
    return preg_replace("/\s+/", ' ', $x);
}, $matches[0]);
print_r($result);

Output输出

Array
(
    [0] => #Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997 #Bank: Citibank (R:2432; L:28)
    [1] => #Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997 #Bank: Citibank (R:2432; L:28 )
    [2] => #Ch. No. 884273 #Dr. Date 10-09-1997 #Ch. Dep. 14-09-1997 #Bank: Citibank (R:2432; L:28 )
)

Answer 2

According to @CBroe comment I came up with this:根据@CBroe 的评论，我想出了这个：

/(#[^\\)\\n]*(?:#Dr).*\\)\\n*)/gsU

#[^\\)\\n]* -> starts with # and prevent to search through all characters that pass ) or \\n (new line). #[^\\)\\n]* -> 以#开头并阻止搜索所有通过)或\\n （新行）的字符。
(?:#Dr) -> the search string in none capturing group. (?:#Dr) -> 无捕获组中的搜索字符串。
.*\\)\\n* -> continue until meet a ) or a \\n (new line). .*\\)\\n* -> 继续直到遇到 a )或\\n （换行）。
gsU -> used flags: g: global search, s: matches new lines, U: ungreedy quantifiers. gsU -> used flags：g：全局搜索，s：匹配新行，U：非贪婪量词。

Demo演示

正则表达式匹配从换行符到括号的所有内容以及搜索词

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-08-31 08:00:10

解决方案2
1 2020-08-31 08:05:55

正则表达式匹配从换行符到括号的所有内容以及搜索词

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-08-31 08:00:10

解决方案2 1 2020-08-31 08:05:55

解决方案1
1 已采纳 2020-08-31 08:00:10

解决方案2
1 2020-08-31 08:05:55