简体   繁体   English

在页面加载之前捕获Drupal7 DOM内容以进行比较

[英]Capturing Drupal7 DOM content before page load for comparison

We have an MU (Multisite) installation of Drupal7 here at work, and are trying to temporarily hold back the swarm of bots we receive until we get a chance to load our content. 我们在这里正在运行Drupal7的MU(多站点)安装,并且正在尝试暂时阻止收到的大量机器人,直到我们有机会加载内容。 I wrote a quick and and dirty script to send 503 headers if we find a certain criteria in Xpath (This can ALSO be done as a strpos/preg_match if DOM is not formed). 如果在Xpath中找到特定条件,我编写了一个快速且肮脏的脚本来发送503标头(如果未形成DOM,也可以将其作为strpos / preg_match来完成)。

In order to get the ball rolling though I need to figure out how to either 为了使球滚动,尽管我需要弄清楚如何

A) Hijack the Drupal7 bootstrap and pull all content through this filter below A)劫持Drupal7引导程序,并通过下面的此过滤器提取所有内容

B) ob_flush content through the filter before content is loaded B)在加载内容之前通过过滤器ob_flush内容

WORTH MENTIONING We use a Module that is called Domain Access, which I believe has led me on this crazy chase in the first place. 值得提的是,我们使用了一个称为域访问的模块,我认为它首先使我陷入了疯狂的追逐。 I know for a fact that it muddles with quite a few files... 我知道事实是它混入了很多文件...

The issue that I am having is figuring out exactly where I can catch the content at? 我遇到的问题是弄清楚我可以在哪里捕捉内容? It should be possible to push the stream into a variable, strpos it, then release it, correct? 应该有可能将流推入一个变量中,放置它,然后释放它,对吗? I thought that index.php in Drupal7 would be the suspect, but I'm a little confused as to where or how I should capture the contents. 我以为Drupal7中的index.php是可疑的,但是对于应该在哪里或如何捕获内容,我有些困惑。 Here's the script, and hopefully someone can point me in the right direction. 这是脚本,希望有人可以指出正确的方向。

//error_reporting(-1);

    /* start query */

    $dom = new DOMDocument;
    $dom->preserveWhiteSpace = false;
    $dom->Load($_SERVER['PHP_SELF']);

    $xpath = new DOMXPath($dom);

        //if this exists we aren't ready to be read by bots
        $query = $xpath->query(".//*[@id='block-views-about-this-site-block']/div/div/div");
        //or $query = 'klat-badge'; //if this is a string not DOM

    /* end query */

if(strpos($query) !== false) { 

    //require banlist
    require('botlist.php'); 

    $str = strtolower('/'.implode('|', array_unique($list)).'/i'); 
    if(preg_match($str, strtolower($_SERVER['HTTP_USER_AGENT']))) {
        //so tell bots we're broken
        header('HTTP/1.1 503 Service Temporarily Unavailable');
        header('Status: 503 Service Temporarily Unavailable');
        exit;
    }
}

It would be a lot easier to just define a constant in a module and check that instead. 仅在模块中定义一个常量并检查该常量会容易得多。 You could then use hook_init() to make a decision on whether the page is ready before the content is even built: 然后,您可以使用hook_init()来决定甚至在构建内容之前页面是否已准备就绪:

define('IN_DEVELOPMENT', TRUE);

function mymodule_init() {
  if (IN_DEVELOPMENT) {
    //require banlist
    require('botlist.php'); 

    $str = strtolower('/'.implode('|', array_unique($list)).'/i'); 
    if(preg_match($str, strtolower($_SERVER['HTTP_USER_AGENT']))) {
      //so tell bots we're broken
      header('HTTP/1.1 503 Service Temporarily Unavailable');
      header('Status: 503 Service Temporarily Unavailable');
      exit;
    }
  }
}

There might be a way to do what you want by loading the whole page content into a DOMDocument but it wont be easy in Drupal (as I'm sure you've already discovered!) and certainly not efficient. 可以通过将整个页面内容加载到DOMDocument来完成您想要的事情,但是在Drupal中(因为我确定您已经发现了它)并不容易,而且效率肯定不高。

Hope that helps 希望能有所帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM