简体   繁体   English

正则表达式用于多行php

[英]regex for numbers on multiple lines php

I have a file that looks like this (yes the line breaks are right): 我有一个看起来像这样的文件(是,换行符正确):

39                                              9
30 30 30 31 34 30 30 32 33 32 36 30 31 38 0D 0A 00014002326018..
39 30 30 30 31 34 30 30 32 33 32 36 30 35 34 0D 900014002326054.
0A                                              .
39 30 30 30 31 34 30 30 32 33 32 36 30 39 31 0D 900014002326091.
0A                                              .
39 30 30 30 31 34 30 30 32 33 32 36 31 36 33 0D 900014002326163.
0A                                              .
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 32 30 30 0D 0A                            26200..
39                                              9
30 30 30 31 34 30 30 32 33 32 36 32 30 30 0D 0A 00014002326200..
39 30 30 30 31 34 30 30 32 33 32 36 31 32 32 0D 900014002326122.
0A                                              .
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 31 35 34 0D 0A                            26154..
39 30 30 30 31 34 30 30 32 33                   9000140023
32 36 31 33 31 0D 0A                            26131..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 31 30 34 0D 0A                            26104..
39 30 30 30 31 34 30 30 32 33 32 36 30 39 30 0D 900014002326090.
0A                                              .
39 30 30 30 31 34 30 30 32 33 32 36 31 39 37 0D 900014002326197.
0A                                              .
39                                              9
30 30 30 31 34 30 30 32 33 32 36 32 30 38 0D 0A 00014002326208..
39 30 30 30 31 34 30 30 32 33                   9000140023
32 36 31 31 35 0D 0A                            26115..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 31 36 34 0D 0A                            26164..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 30 31 36 0D 0A 39 30 30 30 31 34 30 30 32 26016..900014002
33                                              3
32 36 32 34 36 0D 0A                            26246..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 32 34 36 0D 0A                            26246..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 30 37 39 0D 0A                            26079..
39                                              9
30 30 30 31 34 30 30 32 33                      000140023
32 36 31 32 30 0D 0A                            26120..
39                                              9
30 30 30 31 34 30 30 32 33 32 36 32 32 38 0D 0A 00014002326228..
39 30 30 30 31 34 30 30 32 33                   9000140023
32 36 31 38 36 0D 0A                            26186..

I have this code that grabs the EID tags (the numbers that start with 9000) but I can't figure out how to get it to do multiple lines. 我有捕获EID标记(以9000开头的数字)的代码,但我不知道如何使它执行多行。

$data = file_get_contents('tags.txt');

$pattern = "/(\d{15})/i";

preg_match_all($pattern, $data, $tags);
$count = 0;
foreach ( $tags[0] as $tag ){

    echo $tag . '<br />';
    $count++;
}

echo "<br />" . $count . " total head scanned";

For example the first and second line should return 900014002326018 instead of ignoring the first and second line 例如,第一和第二行应返回900014002326018而不是忽略第一和第二行

I am not good with regular expressions, so if you could explain so I learn and stop having to have someone help me with simple regex, that would be awesome. 我对正则表达式不好,所以如果您能解释一下,那么我会学习并且不再需要别人帮助您使用简单的正则表达式,那就太好了。

EDIT: The whole number is 15 digits starting with 9000 编辑:整数是从9000开始的15位数字

You can do this: 你可以这样做:

$result = preg_replace('~\R?(?:[0-9A-F]{2}\h+)+~', '', $data);
$result = explode('..', rtrim($result, '.'));

pattern details: 图案细节:

\R?            # optional newline character
(?:            # open a non-capturing group
  [0-9A-F]{2}  # two hexadecimal characters
  \h+          # horizontal white characters (spaces or tabs)
)+             # repeat the non-capturing group one or more times

After this replacement the only content you must remove are the two dots. 替换之后,您唯一必须删除的内容是两个点。 After removing the trailing dots, you can use these to explode the string to an array. 删除尾随点后,可以使用它们将字符串分解为数组。

An other way 其它的办法

Since you know that there is always 48 characters before the part of integers (and dots), you can use this pattern too: 由于您知道在整数(和点)部分之前总是有48个字符,因此您也可以使用以下模式:

$result = preg_replace('~(?:^|\R).{48}~', '', $data);

An other way without regex 没有正则表达式的另一种方法

The idea is to read the file line by line and, since the length before the content is always the same (ie 16*3 characters -> 48 characters), extract the substring with the integer and concatenate it into the $data temporary variable. 想法是逐行读取文件,并且由于内容之前的长度始终相同(即16 * 3个字符-> 48个字符),因此请提取带有整数的子字符串并将其连接到$data临时变量中。

ini_set("auto_detect_line_endings", true);
$data = '';
$handle = @fopen("tags.txt", "r");
if ($handle) {
    while (($buffer = fgets($handle, 128)) !== false) {
        $data .= substr($buffer, 48, -1);
    }
    if (!feof($handle)) {
        echo "Error: fgets() has failed\n";
    }
    fclose($handle);
} else {
    echo "Error opening the file\n";
}

$result = explode ('..', rtrim($data, '.'));

Note: if the file has a windows format (with the end of line \\r\\n ) you must change the third parameter of the substr() function to -2 . 注意:如果文件具有Windows格式(以\\r\\n行结尾),则必须将substr()函数的第三个参数更改为-2 If you are interested by how to detect newlines type, you can take a look at this post . 如果您对如何检测换行符类型感兴趣,可以查看这篇文章

I don't think it's even possible to do this with a single regex, but your code will be far more legible and maintainable if you approach this one step at a time. 我认为使用单个正则表达式甚至不可能做到这一点,但是如果您一次执行这一步,您的代码将更加清晰易读。

This works, and it shouldn't be too hard to figure out how it works: 这可以工作,并且不难发现它是如何工作的:

$eid_tag_src = <<<END_EID_TAGS
39                                              9
30 30 30 31 34 30 30 32 33 32 36 30 31 38 0D 0A 00014002326018..
39 30 30 30 31 34 30 30 32 33 32 36 30 35 34 0D 900014002326054.
  :
 etc.
  :
39 30 30 30 31 34 30 30 32 33                   9000140023
32 36 31 38 36 0D 0A                            26186..
END_EID_TAGS;

/* Remove hex data from first 48 characters of each line */
$eid_tag_src = preg_replace('/^.{48}/m','',$eid_tag_src);

/* Remove all white space */
$eid_tag_src = preg_replace('/\s+/','',$eid_tag_src);

/* Replace dots (CRLF) with spaces */
$eid_tag_src = str_replace('..',' ',$eid_tag_src);

/* Convert to array of EID tags */
$eid_tags = explode(' ',trim($eid_tag_src));

print_r($eid_tags);

Here's the output: 这是输出:

Array
(
    [0] => 900014002326018
    [1] => 900014002326054
    [2] => 900014002326091
    [3] => 900014002326163
    [4] => 900014002326200
    [5] => 900014002326200
    [6] => 900014002326122
    [7] => 900014002326154
    [8] => 900014002326131
    [9] => 900014002326104
    [10] => 900014002326090
    [11] => 900014002326197
    [12] => 900014002326208
    [13] => 900014002326115
    [14] => 900014002326164
    [15] => 900014002326016
    [16] => 900014002326246
    [17] => 900014002326246
    [18] => 900014002326079
    [19] => 900014002326120
    [20] => 900014002326228
    [21] => 900014002326186
)

Here's an approach using effective grabbing (without replacing): 这是一种使用有效抓取 (无需替换)的方法:

RegEx : /(?:^.{48}|\\.)([0-9]+\\.?)/m - explained demo RegEx/(?:^.{48}|\\.)([0-9]+\\.?)/m (?: /(?:^.{48}|\\.)([0-9]+\\.?)/m .?)/ /(?:^.{48}|\\.)([0-9]+\\.?)/m 演示说明

Which means (in plain english): start grabbing digits followed by an optional dot IF from the start of the line there are 48 characters in front of them OR a dot (special case). 这意味着(用简单的英语):从行的开头开始抓取数字,后跟可选的点IF ,在它们前面有48个字符一个点(特殊情况)。

And your code could look like this: 您的代码可能如下所示:

$pattern = '/(?:^.{48}|\.)([0-9]+\.?)/m'; 

preg_match_all($pattern, $data, $tags);

//join all the bits belonging to the number
$data=implode("", $tags[1]); 

//count the dots to have a correct count of the numbers grabbed
//since each number was grabbed with an ending dot initially
$count=substr_count($data, ".");

//replace the dots with a html <br> tag (avoiding a split and a foreach loop)
$tags=str_replace('.', "<br>", $data); 

print $tags . "<br>" . $count . " total scanned";

See the code live at http://3v4l.org/Z4EhI http://3v4l.org/Z4EhI上实时查看代码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM