[英]Search consecutive 'User-agent' directive in robots.txt with PHP
With PHP, I want to check (true/false) if there are consecutive 'User-agent' directive in robots.txt file.使用 PHP,我想检查(真/假)robots.txt 文件中是否有连续的“用户代理”指令。
With this regexp, preg_match('~User-agent:\\h*(?:\\R|$)~i', $string)
I found all 'User-agent:' line but I haven't found how to detect consecutive lines.使用这个正则表达式,
preg_match('~User-agent:\\h*(?:\\R|$)~i', $string)
我找到了所有的 'User-agent:' 行,但我还没有找到如何检测连续的线。
User-agent: # 'User-agent:'
\h* # horizontal whitespace (0 or more times)
(?: # group, but do not capture:
\R # '\R' (any Unicode newline sequence)
| # OR
$ # before an optional \n, and the end of the string
) # end of grouping
For example例如
User-agent: 008
user-agent: Accoona
User-Agent: Googlebot
User-Agent: aipbot*
disallow: /
Result: True结果:真
User-Agent: Googlebot
Crawl-delay: 60
User-agent: aipbot*
disallow: /
Result: False结果:错误
User-agent: 008
Crawl-delay: 2
user-agent: Accoona
User-Agent: Googlebot
User-Agent: aipbot*
disallow: /
Result: True结果:真
This may seem a derpy answer, but why not just repeat your regex?这似乎是一个愚蠢的答案,但为什么不重复您的正则表达式呢? Surely
User-agent:\\h*(?:[a-zA-Z0-9\\*]*\\R|$)User-agent:\\h*(?:[a-zA-Z0-9\\*]*\\R|$)
only matches if there are two consecutive user agents?当然
User-agent:\\h*(?:[a-zA-Z0-9\\*]*\\R|$)User-agent:\\h*(?:[a-zA-Z0-9\\*]*\\R|$)
仅在有两个连续的用户代理时才匹配?
https://regex101.com/r/ximRMo/1 https://regex101.com/r/ximRMo/1
Add/remove non-user-agent lines between the consecutive one, 0 matches.在连续的 0 个匹配项之间添加/删除非用户代理行。 Two consecutive lines cause a match.
两个连续的行导致匹配。
Without regex:没有正则表达式:
$filePath = 'robots.txt';
try {
if ( false === $fh = fopen($filePath, 'rb') )
throw new Exception('Could not open the file!');
} catch (Exception $e) {
echo 'Error (File: ' . $e->getFile() . ', line ' . $e->getLine() . '): ' . $e->getMessage();
}
var_dump(hasSuccessiveUA($fh));
fclose($fh);
function hasSuccessiveUA($fh) {
$previous = false;
while ( false !== $line = fgets($fh) ) {
$current = ( stripos($line, 'user-agent:') === 0 );
if ( $previous && $current ) return true;
$previous = $current;
}
return false;
}
Advantage: when the answer is true, you don't have to load the file until the end.优点:当答案为真时,您不必加载文件直到结束。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.