简体   繁体   English

PHP 中的这个正则表达式真的有效吗?

[英]Does this regex in PHP actually work?

I am hoping the regular expression experts can tell me why this is going wrong:我希望正则表达式专家能告诉我为什么会出错:

This regex:这个正则表达式:

$pattern = '/(?<percent>[0-9]{1,3}\.[0-9]{1,2})% of (?<filesize>.+) at/';

Should match this sort of string:应该匹配这种字符串:

[download] 87.1% of 4.40M at 107.90k/s ETA 00:05 
[download] 89.0% of 4.40M at 107.88k/s ETA 00:04 
[download] 91.4% of 4.40M at 106.09k/s ETA 00:03 
[download] 92.9% of 4.40M at 105.55k/s ETA 00:03

Correct?正确的? Is there anything that could go wrong with that regex that will not get it to match with the above input? go 是否有任何错误的正则表达式不会使其与上述输入相匹配? Full usage here:这里的完整用法:

while(!feof($handle))
{
    $progress = fread($handle, 8192);
    $pattern = '/(?<percent>[0-9]{1,3}\.[0-9]{1,2})% of (?<filesize>.+) at/';
    if(preg_match_all($pattern, $progress, $matches)){
    //matched
    }
}

Could how much that is being read by fread be effecting the regex to work correctly? fread读取的内容会影响正则表达式的正常工作吗?

I really need confirmation as I am trying to identify why it isn't working on a new server.我真的需要确认,因为我正在尝试确定它为什么不能在新服务器上运行。 This question is related to Change in Server Permits script not to work.这个问题与更改服务器许可脚本不起作用有关。 Can this be due to PHP.ini being different? 这可能是因为 PHP.ini 不同吗?

Thanks all谢谢大家

Update 2更新 2

I have made a test script to test the regex but even on its own it doesn't work??我已经制作了一个测试脚本来测试正则表达式,但即使它自己也不起作用?

<?php 

error_reporting(E_ALL);

echo 'Start';

$progress = "[download]75.1% of 4.40M at 115.10k/s ETA 00:09 [download] 77.2% of 4.40M at 112.36k/s ETA 00:09 [download] 78.6% of 4.40M at 111.41k/s ETA 00:08 [download] 80.3% of 4.40M at 110.80k/s ETA 00:07 [download] 82.3% of 4.40M at 110.30k/s ETA 00:07 [download] 84.3% of 4.40M at 108.33k/s ETA 00:06 [download] 85.7% of 4.40M at 107.62k/s ETA 00:05 [download] 87.5% of 4.40M at 107.21k/s ETA 00:05 [download] 89.5% of 4.40M at 105.10k/s ETA 00:04 [download] 90.7% of 4.40M at 106.45k/s ETA 00:03 [download] 93.2% of 4.40M at 104.92k/s ETA 00:02 [download] 94.8% of 4.40M at 104.40k/s ETA 00:02 [download] 96.5% of 4.40M at 102.47k/s ETA 00:01 [download] 97.7% of 4.40M at 103.48k/s ETA 00:01 [download] 100.0% of 4.40M at 103.15k/s ETA 00:00 [download] 100.0% of 4.40M at 103.16k/s ETA 00:00
";

$pattern = '/(?<percent>\d{1,3}\.\d{1,2})%\s+of\s+(?<filesize>[\d.]+[kBM]) at/';

if(preg_match_all($pattern, $progress, $matches)){
    echo 'match';
}

echo '<br>Done<br>';    

?>

I am not that familiar with named capture, but I think in PHP it should be:我对命名捕获不太熟悉,但我认为在 PHP 中应该是:

$pattern = '/(?P<percent>[0-9]{1,3}\.[0-9]{1,2})% of (?P<filesize>.+) at/';

Notice the P after the question mark.注意问号后面的P。

Source:资源:

The regex seems okay to me.正则表达式对我来说似乎没问题。

However, there are some things I would improve:但是,有一些事情我会改进:

  • whitespace with "\s+" , instead of " "带有"\s+"空格,而不是" "
  • numbers with "\d" , not with "[0-9]" (same thing, it's just shorter)带有"\d"的数字,而不是带有"[0-9]"的数字(同样的事情,只是更短)
  • filesize not with ".+" , but with something more specific文件大小不是用".+" ,而是用更具体的东西

This would be my version:这将是我的版本:

(?<percent>\d{1,3}\.\d{1,2})%\s+of\s+(?<filesize>[\d.]+[kBM])

Depending on how much you expect to get wrong number formats (I would guess: not very likely), you can shorten it to:根据您期望得到多少错误的数字格式(我猜:不太可能),您可以将其缩短为:

(?<percent>[\d.]+)%\s+of\s+(?<filesize>[\d.]+[kBM])

If your stream actually delivers more than 8kb of data in one read, you'll probably truncate the last line, which will prevent it from being matched.如果您的 stream 实际上在一次读取中提供了超过 8kb 的数据,您可能会截断最后一行,这将阻止它被匹配。 Try reading the stream one line at a time using fgets() instead.尝试使用fgets()一次读取 stream 一行。

I would use fgets() for reading line-based, since you want to match per line I assume.我会使用 fgets() 来读取基于行的,因为你想匹配我假设的每一行。 If you match per line instead, you would not need to use preg_match_all, but only preg_match.如果改为按行匹配,则不需要使用 preg_match_all,而只需使用 preg_match。

You only seem to have 1 decimal in your percentage, but you match 1,2 digits?您的百分比似乎只有 1 位小数,但您匹配 1,2 位数字?

Is there anything that could go wrong with that regex that will not get it to match with the above input? go 是否有任何错误的正则表达式不会使其与上述输入相匹配?

Not that I can see, but there's something that does go wrong to make it match far too much: if you really don't have newlines, then this:不是我能看到的,但是 go 有一些错误使它匹配太多:如果你真的没有换行符,那么这个:

(?P<filesize>.+) at

can match greedily from the start to the last “ at” in the input.可以贪婪地匹配输入中从头到尾的“at”。 So if I match against the whole example input you posted, I get a <percent> of:因此,如果我匹配您发布的整个示例输入,我会得到一个 <percent>:

75.1

(good) and a filesize of: (好)和文件大小:

4.40M at 115.10k/s ETA 00:09 [download] 77.2% of 4.40M at 112.36k/s ETA 00:09 [download] 78.6% of 4.40M at 111.41k/s ETA 00:08 [download] 80.3% of 4.40M at 110.80k/s ETA 00:07 [download] 82.3% of 4.40M at 110.30k/s ETA 00:07 [download] 84.3% of 4.40M at 108.33k/s ETA 00:06 [download] 85.7% of 4.40M at 107.62k/s ETA 00:05 [download] 87.5% of 4.40M at 107.21k/s ETA 00:05 [download] 89.5% of 4.40M at 105.10k/s ETA 00:04 [download] 90.7% of 4.40M at 106.45k/s ETA 00:03 [download] 93.2% of 4.40M at 104.92k/s ETA 00:02 [download] 94.8% of 4.40M at 104.40k/s ETA 00:02 [download] 96.5% of 4.40M at 102.47k/s ETA 00:01 [download] 97.7% of 4.40M at 103.48k/s ETA 00:01 [download] 100.0% of 4.40M at 103.15k/s ETA 00:00 [download] 100.0% of 4.40M

(not quite so good). (不太好)。 To avoid this use the non-greedy match “.+?”, or a more specific expression like “[^ ]+” or Tomalak's version.为避免这种情况,请使用非贪婪匹配“.+?”,或更具体的表达式,如“[^ ]+”或 Tomalak 的版本。

Could how much that is being read by fread be effecting the regex to work correctly? fread 读取的内容会影响正则表达式的正常工作吗?

Yes.是的。 Reading in chunks is quite unreliable: if a '[download]' line is split over a chunk boundary, it will not match and will be lost.分块读取是非常不可靠的:如果一个“[下载]”行在一个块边界上被分割,它将不匹配并且会丢失。 You can either:您可以:

  • not care, or不在乎,或
  • read the whole input at once, or一次读取整个输入,或
  • use line-based reading if there really are newlines in the input (there usually are)如果输入中确实有换行符(通常有),则使用基于行的阅读
  • manage the buffer manually by retaining the last n characters of the input (where n is the index of the end of the final match found) and appending the new incoming input to it.通过保留输入的最后 n 个字符(其中 n 是找到的最终匹配结束的索引)并将新的传入输入附加到它来手动管理缓冲区。

As for server differences, the only thing I can think of is that if one of the servers is Windows and one a *ix, they will have different ideas of what a newline is, which might cause the “are there newlines or not?”至于服务器差异,我唯一能想到的是,如果其中一台服务器是 Windows 和一台 *ix,他们对换行符的概念会有所不同,这可能会导致“有没有换行符?” confusion.混乱。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM