简体   繁体   English

使用 PHP 在 robots.txt 中搜索连续的“用户代理”指令

[英]Search consecutive 'User-agent' directive in robots.txt with PHP

With PHP, I want to check (true/false) if there are consecutive 'User-agent' directive in robots.txt file.使用 PHP,我想检查(真/假)robots.txt 文件中是否有连续的“用户代理”指令。

With this regexp, preg_match('~User-agent:\\h*(?:\\R|$)~i', $string) I found all 'User-agent:' line but I haven't found how to detect consecutive lines.使用这个正则表达式, preg_match('~User-agent:\\h*(?:\\R|$)~i', $string)我找到了所有的 'User-agent:' 行,但我还没有找到如何检测连续的线。

User-agent:    # 'User-agent:'
\h*            # horizontal whitespace (0 or more times)
(?:            # group, but do not capture:
  \R           #   '\R' (any Unicode newline sequence) 
 |             #  OR
  $            #   before an optional \n, and the end of the string
)              # end of grouping

For example例如

User-agent: 008
user-agent: Accoona
User-Agent: Googlebot
User-Agent: aipbot*
disallow: /

Result: True结果:

User-Agent: Googlebot
Crawl-delay: 60
User-agent: aipbot*
disallow: /

Result: False结果:错误

User-agent: 008
Crawl-delay: 2
user-agent: Accoona
User-Agent: Googlebot
User-Agent: aipbot*
disallow: /

Result: True结果:

This may seem a derpy answer, but why not just repeat your regex?这似乎是一个愚蠢的答案,但为什么不重复您的正则表达式呢? Surely User-agent:\\h*(?:[a-zA-Z0-9\\*]*\\R|$)User-agent:\\h*(?:[a-zA-Z0-9\\*]*\\R|$) only matches if there are two consecutive user agents?当然User-agent:\\h*(?:[a-zA-Z0-9\\*]*\\R|$)User-agent:\\h*(?:[a-zA-Z0-9\\*]*\\R|$)仅在有两个连续的用户代理时才匹配?

https://regex101.com/r/ximRMo/1 https://regex101.com/r/ximRMo/1

Add/remove non-user-agent lines between the consecutive one, 0 matches.在连续的 0 个匹配项之间添加/删除非用户代理行。 Two consecutive lines cause a match.两个连续的行导致匹配。

Without regex:没有正则表达式:

$filePath = 'robots.txt';

try {
    if ( false === $fh = fopen($filePath, 'rb') )
        throw new Exception('Could not open the file!');

} catch (Exception $e) {
    echo 'Error (File: ' . $e->getFile() . ', line ' . $e->getLine() . '): ' . $e->getMessage();
}

var_dump(hasSuccessiveUA($fh));

fclose($fh);    

function hasSuccessiveUA($fh) {
    $previous = false;

    while ( false !== $line = fgets($fh) ) {
        $current = ( stripos($line, 'user-agent:') === 0 );
        if ( $previous && $current ) return true;
        $previous = $current;
    }

    return false;
}

Advantage: when the answer is true, you don't have to load the file until the end.优点:当答案为真时,您不必加载文件直到结束。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM