[英]Does this regex in PHP actually work?
I am hoping the regular expression experts can tell me why this is going wrong:我希望正则表达式专家能告诉我为什么会出错:
This regex:这个正则表达式:
$pattern = '/(?<percent>[0-9]{1,3}\.[0-9]{1,2})% of (?<filesize>.+) at/';
Should match this sort of string:应该匹配这种字符串:
[download] 87.1% of 4.40M at 107.90k/s ETA 00:05
[download] 89.0% of 4.40M at 107.88k/s ETA 00:04
[download] 91.4% of 4.40M at 106.09k/s ETA 00:03
[download] 92.9% of 4.40M at 105.55k/s ETA 00:03
Correct?正确的? Is there anything that could go wrong with that regex that will not get it to match with the above input? go 是否有任何错误的正则表达式不会使其与上述输入相匹配? Full usage here:这里的完整用法:
while(!feof($handle))
{
$progress = fread($handle, 8192);
$pattern = '/(?<percent>[0-9]{1,3}\.[0-9]{1,2})% of (?<filesize>.+) at/';
if(preg_match_all($pattern, $progress, $matches)){
//matched
}
}
Could how much that is being read by fread be effecting the regex to work correctly? fread读取的内容会影响正则表达式的正常工作吗?
I really need confirmation as I am trying to identify why it isn't working on a new server.我真的需要确认,因为我正在尝试确定它为什么不能在新服务器上运行。 This question is related to Change in Server Permits script not to work.这个问题与更改服务器许可脚本不起作用有关。 Can this be due to PHP.ini being different? 这可能是因为 PHP.ini 不同吗?
Thanks all谢谢大家
I have made a test script to test the regex but even on its own it doesn't work??我已经制作了一个测试脚本来测试正则表达式,但即使它自己也不起作用?
<?php
error_reporting(E_ALL);
echo 'Start';
$progress = "[download]75.1% of 4.40M at 115.10k/s ETA 00:09 [download] 77.2% of 4.40M at 112.36k/s ETA 00:09 [download] 78.6% of 4.40M at 111.41k/s ETA 00:08 [download] 80.3% of 4.40M at 110.80k/s ETA 00:07 [download] 82.3% of 4.40M at 110.30k/s ETA 00:07 [download] 84.3% of 4.40M at 108.33k/s ETA 00:06 [download] 85.7% of 4.40M at 107.62k/s ETA 00:05 [download] 87.5% of 4.40M at 107.21k/s ETA 00:05 [download] 89.5% of 4.40M at 105.10k/s ETA 00:04 [download] 90.7% of 4.40M at 106.45k/s ETA 00:03 [download] 93.2% of 4.40M at 104.92k/s ETA 00:02 [download] 94.8% of 4.40M at 104.40k/s ETA 00:02 [download] 96.5% of 4.40M at 102.47k/s ETA 00:01 [download] 97.7% of 4.40M at 103.48k/s ETA 00:01 [download] 100.0% of 4.40M at 103.15k/s ETA 00:00 [download] 100.0% of 4.40M at 103.16k/s ETA 00:00
";
$pattern = '/(?<percent>\d{1,3}\.\d{1,2})%\s+of\s+(?<filesize>[\d.]+[kBM]) at/';
if(preg_match_all($pattern, $progress, $matches)){
echo 'match';
}
echo '<br>Done<br>';
?>
I am not that familiar with named capture, but I think in PHP it should be:我对命名捕获不太熟悉,但我认为在 PHP 中应该是:
$pattern = '/(?P<percent>[0-9]{1,3}\.[0-9]{1,2})% of (?P<filesize>.+) at/';
Notice the P after the question mark.注意问号后面的P。
Source:资源:
The regex seems okay to me.正则表达式对我来说似乎没问题。
However, there are some things I would improve:但是,有一些事情我会改进:
"\s+"
, instead of " "
带有"\s+"
空格,而不是" "
"\d"
, not with "[0-9]"
(same thing, it's just shorter)带有"\d"
的数字,而不是带有"[0-9]"
的数字(同样的事情,只是更短)".+"
, but with something more specific文件大小不是用".+"
,而是用更具体的东西This would be my version:这将是我的版本:
(?<percent>\d{1,3}\.\d{1,2})%\s+of\s+(?<filesize>[\d.]+[kBM])
Depending on how much you expect to get wrong number formats (I would guess: not very likely), you can shorten it to:根据您期望得到多少错误的数字格式(我猜:不太可能),您可以将其缩短为:
(?<percent>[\d.]+)%\s+of\s+(?<filesize>[\d.]+[kBM])
I would use fgets() for reading line-based, since you want to match per line I assume.我会使用 fgets() 来读取基于行的,因为你想匹配我假设的每一行。 If you match per line instead, you would not need to use preg_match_all, but only preg_match.如果改为按行匹配,则不需要使用 preg_match_all,而只需使用 preg_match。
You only seem to have 1 decimal in your percentage, but you match 1,2 digits?您的百分比似乎只有 1 位小数,但您匹配 1,2 位数字?
Is there anything that could go wrong with that regex that will not get it to match with the above input? go 是否有任何错误的正则表达式不会使其与上述输入相匹配?
Not that I can see, but there's something that does go wrong to make it match far too much: if you really don't have newlines, then this:不是我能看到的,但是 go 有一些错误使它匹配太多:如果你真的没有换行符,那么这个:
(?P<filesize>.+) at
can match greedily from the start to the last “ at” in the input.可以贪婪地匹配输入中从头到尾的“at”。 So if I match against the whole example input you posted, I get a <percent> of:因此,如果我匹配您发布的整个示例输入,我会得到一个 <percent>:
75.1
(good) and a filesize of: (好)和文件大小:
4.40M at 115.10k/s ETA 00:09 [download] 77.2% of 4.40M at 112.36k/s ETA 00:09 [download] 78.6% of 4.40M at 111.41k/s ETA 00:08 [download] 80.3% of 4.40M at 110.80k/s ETA 00:07 [download] 82.3% of 4.40M at 110.30k/s ETA 00:07 [download] 84.3% of 4.40M at 108.33k/s ETA 00:06 [download] 85.7% of 4.40M at 107.62k/s ETA 00:05 [download] 87.5% of 4.40M at 107.21k/s ETA 00:05 [download] 89.5% of 4.40M at 105.10k/s ETA 00:04 [download] 90.7% of 4.40M at 106.45k/s ETA 00:03 [download] 93.2% of 4.40M at 104.92k/s ETA 00:02 [download] 94.8% of 4.40M at 104.40k/s ETA 00:02 [download] 96.5% of 4.40M at 102.47k/s ETA 00:01 [download] 97.7% of 4.40M at 103.48k/s ETA 00:01 [download] 100.0% of 4.40M at 103.15k/s ETA 00:00 [download] 100.0% of 4.40M
(not quite so good). (不太好)。 To avoid this use the non-greedy match “.+?”, or a more specific expression like “[^ ]+” or Tomalak's version.为避免这种情况,请使用非贪婪匹配“.+?”,或更具体的表达式,如“[^ ]+”或 Tomalak 的版本。
Could how much that is being read by fread be effecting the regex to work correctly? fread 读取的内容会影响正则表达式的正常工作吗?
Yes.是的。 Reading in chunks is quite unreliable: if a '[download]' line is split over a chunk boundary, it will not match and will be lost.分块读取是非常不可靠的:如果一个“[下载]”行在一个块边界上被分割,它将不匹配并且会丢失。 You can either:您可以:
As for server differences, the only thing I can think of is that if one of the servers is Windows and one a *ix, they will have different ideas of what a newline is, which might cause the “are there newlines or not?”至于服务器差异,我唯一能想到的是,如果其中一台服务器是 Windows 和一台 *ix,他们对换行符的概念会有所不同,这可能会导致“有没有换行符?” confusion.混乱。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.