PHP：使用正则表达式将日志条目解析为多个部分

Question

I need some help since I'm not that PHP RegEx expert.我需要一些帮助，因为我不是 PHP RegEx 专家。 I have this line of text here, which will always be the same (except the message at the end):我这里有这行文字，它总是一样的（除了最后的消息）：

2021-12-08T18:18:38+00:00 INFO Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1\r\nMESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.\r\n"

I have 3 functions which should return parts of the log entry:我有 3 个函数应该返回部分日志条目：

public function get_log_file_entry_time( string $entry ): string {
    
}

public function get_log_file_entry_level( string $entry ): string {

}

public function get_log_file_entry_message( string $entry ): string {

}

I've first tried using explode with a whitespace as delimiter, which works but not the best way since the log message can be very long in some cases.我首先尝试使用带有空格的explode 作为分隔符，这虽然有效，但不是最好的方法，因为在某些情况下日志消息可能很长。

I'm not that RegEx expert, but I've found the following combination to match the first two pieces: ([^\s]+) ([AZ]+)我不是 RegEx 专家，但我找到了以下组合来匹配前两个部分： ([^\s]+) ([AZ]+)

This returns me the timestamp and the level.这将返回时间戳和级别。 Now I'm struggling to get the message after the second group - maybe my nesting is not perfect at all.现在我很难在第二组之后得到消息 - 也许我的嵌套根本不完美。 Any advice would make me happy!任何建议都会让我开心！

Notice注意

The message will start after the first whitespace after the logging level.该消息将在日志记录级别之后的第一个空格之后开始。 For example:例如：

Produktbestand erfolgreich von Collmex abgerufen | Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1\r\nMESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.\r\n" "STOCK_AVAILABLE;23;1;363;PCE;-1\r\nMESSAGE;S;204020;日期？bertragung erfolgreich。Es wurden 1 日期？tze verarbeitet。\r\n"

Answer 1

You can use 3 capture groups, where the 3rd group contains the rest of the line, followed by all lines that do not start with a date time like pattern.您可以使用 3 个捕获组，其中第 3 组包含行的 rest，然后是所有不以日期时间样模式开头的行。

You can make the pattern a bit more specific for group 1, and to match the rest of the lines that do not start with the group 1 pattern, you can recurse the first sub pattern using (?1)您可以使模式更具体地用于组 1，并匹配不以组 1 模式开头的行的 rest，您可以使用(?1)递归第一个子模式

^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\+\d{2}:\d{2})\h+([A-Z]+)\h+(.*(?:\R(?!(?1)).*)*)

In parts, the pattern matches:在部分情况下，模式匹配：

^ Start of string ^字符串开头
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\+\d{2}:\d{2}) Capture group 1 , match a date and time like pattern (\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\+\d{2}:\d{2})捕获组 1 ，匹配日期和时间类似模式
\h+ Match 1+ horizontal whitespace chars \h+匹配 1+ 个水平空白字符
([AZ]+) Capture group 2 , match 1+ uppercase chars AZ ([AZ]+)捕获组 2 ，匹配 1+ 大写字符 AZ
\h+ Match 1+ horizontal whitespace chars \h+匹配 1+ 个水平空白字符
( Capture group 3 (捕获组 3
- .* Match the rest of the ine .*匹配ine的rest
- (?:\R(??(.1)).*)* Optionally repeat matching a newline and the rest of the line asserting that what is directly to the right from the current position does not match sub pattern 1 (the pattern group 1) (?:\R(??(.1)).*)*可选地重复匹配换行符和该行的 rest 断言当前 position 直接位于右侧的内容与子模式 1 不匹配（模式组1)
) Close group 3 )关闭第 3 组

See a regex demo and a PHP demo .请参阅正则表达式演示和PHP 演示。

For example with 2 lines, both starting with the same pattern:例如有 2 行，都以相同的模式开始：

$re = '/^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\+\d{2}:\d{2})\h+([A-Z]+)\h+(.*(?:\R(?!(?1)).*)*)/m';
$str = '2021-12-08T18:18:38+00:00 INFO Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"
2021-12-08T18:18:38+00:00 INFO Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

foreach ($matches as $match) {
    print_r($match);
}

Output Output

Array
(
    [0] => 2021-12-08T18:18:38+00:00 INFO Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"
    [1] => 2021-12-08T18:18:38+00:00
    [2] => INFO
    [3] => Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"
)
Array
(
    [0] => 2021-12-08T18:18:38+00:00 INFO Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"
    [1] => 2021-12-08T18:18:38+00:00
    [2] => INFO
    [3] => Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1
MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.
"
)

Answer 2

Here's a simple method with explode() and its limit parameter.这是一个使用explode()及其限制参数的简单方法。

list($date, $severity, $message) = explode(' ', $str, 3);

var_dump($date, $severity, $message);
/*
string(25) "2021-12-08T18:18:38+00:00"
string(4) "INFO"
string(170) "Produktbestand erfolgreich von Collmex abgerufen | "STOCK_AVAILABLE;23;1;363;PCE;-1 MESSAGE;S;204020;Daten?bertragung erfolgreich. Es wurden 1 Datens?tze verarbeitet.""
*/

As long as the spaces before the message are constant, and none of the parts leading up to it can contain spaces, this will work.只要消息之前的空格是不变的，并且导致它的任何部分都不能包含空格，这将起作用。 If any part before the message has spaces some of the time then this will not work consistently.如果消息之前的任何部分有时有空格，那么这将无法始终如一地工作。

PHP：使用正则表达式将日志条目解析为多个部分

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-12-08 22:35:37

解决方案2
1 2021-12-08 22:43:04

PHP：使用正则表达式将日志条目解析为多个部分

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-12-08 22:35:37

解决方案2 1 2021-12-08 22:43:04

解决方案1
2 已采纳 2021-12-08 22:35:37

解决方案2
1 2021-12-08 22:43:04