简体   繁体   English

如何欺骗爬虫/扫描仪检测网络技术?

[英]How to trick a crawler/scanner from detecting the web technology?

I have a PHP website and I have all the URL rewriting done. 我有一个PHP网站,并且所有URL重写都已完成。 Such that the PHP extensions are hidden. 这样就隐藏了PHP扩展。 But somehow the webcrawlers and security scanners / vulnerability scanners are able to find my site was developed in PHP. 但是以某种方式,网络爬虫和安全扫描程序/漏洞扫描程序能够找到我的网站是用PHP开发的。

How do I avoid that or How do I trick them that this website was not developed through PHP ? 如何避免这种情况或如何欺骗他们该网站不是通过PHP开发的?

As mentioned in a comment, I develop on a security scanner which is probably similar to the one you're trying to hide certain information from. 正如评论中提到的,我开发了一种安全扫描程序,它可能类似于您要隐藏某些信息的程序。

One of the reasons this is difficult to achieve is because security scanners don't just look at one thing, usually. 很难做到这一点的原因之一是因为安全扫描程序通常不只是看一眼。 The one I work on uses a very large database of fingerprints to determine if specific files or behaviors represent a certain plugin, framework, site builder, or even if the website is generated in a structurally similar way to another that uses a specific tool to generate the HTML. 我正在研究的网站使用一个非常庞大的指纹数据库来确定特定文件或行为是否代表某个插件,框架,网站建设者,甚至该网站的生成方式与使用特定工具生成网站的方式类似HTML。

Once we discover one technology, we can relate it based on those fingerprints to another website that doesn't expose all of the same information or perhaps even intentionally changes it to something misleading. 一旦发现一种技术,我们便可以基于这些指纹将其与另一个不会暴露所有相同信息甚至可能有意将其更改为误导性信息的网站相关联。

A great example of this is when people change their X-Powered-By header to something that is not representative of what they use. 一个很好的例子是,当人们将其X-Powered-By标头更改为不代表其使用的内容时。

Say if you ran a PHP driven website but your X-Powered-By header was "Microsoft ASP.NET" or anything else. 假设您运行的是PHP驱动的网站,但X-Powered-By标头是“ Microsoft ASP.NET”或其他任何东西。 We could assume the information is false or otherwise questionable if all of your extensions end in .php or are hidden. 如果您所有的扩展名都以.php或被隐藏,我们可以假设该信息是错误的或存在其他问题。 There are also certain behavioral nuances that exist for other technologies, such as ASP.NET, which are the existence of structural fingerprints like the _VIEWSTATE strings. 对于其他技术,例如ASP.NET,也存在某些行为上的细微差别,例如_VIEWSTATE字符串之类的结构指纹。

Additionally, you need to keep in mind things like URL formats, POST/PUT behavior, and even what other software you run. 另外,您需要牢记URL格式,POST / PUT行为,甚至您运行的其他软件等内容。 If you run WordPress for example, it's very probable that you're using PHP. 例如,如果运行WordPress,则很有可能正在使用PHP。

This is only a small example. 这只是一个小例子。 There are thousands of rules per technology which generate more and more confidence that we're right about our guess. 每种技术都有成千上万条规则,这些规则越来越产生出我们对自己的猜测正确的信心。 We have a database of products that each have unique or crossed fingerprints and it has around 10,000 identified products in it. 我们有一个产品数据库,每个产品都有唯一或交叉的指纹,并且其中包含大约10,000个已识别产品。

All of this information is collected and analyzed. 所有这些信息都被收集和分析。 If we determine a website is not representing itself correctly, it flags the website and a list of pages in question for human review, at which point an analyst will manually plug away at the website and determine its technologies by hand and figure out new fingerprints for it. 如果我们确定网站不能正确地代表自己,则会标记该网站和有问题的页面列表以供人工审核,这时分析师将手动插入该网站并手动确定其技术并找出新的指纹。它。

One-of-the legitimate ways of doing it. 一种合法的方式。


Well most web-vulnerability scanners or crawlers make use of your website headers to find out this. 大多数网络漏洞扫描程序或爬网程序都利用您的网站标题来发现这一点。 Say if you do this.. 说,如果你这样做。

<?php
var_dump(headers_list());

You will get.. 你会得到..

array(1) {
  [0]=>
  string(23) "X-Powered-By: PHP/5.4.3"
}

So with that information a crawler can easily make up that your site was developed with PHP. 因此,借助这些信息,爬虫可以轻松地确定您的网站是使用PHP开发的。

How to avoid this ? 如何避免这种情况?

You could make use of header_remove() in PHP for that. 您可以header_remove()使用PHP中的header_remove()

As you can see from the code.. 从代码中可以看到。

<?php
echo "<pre>";
var_dump(headers_list());
header_remove();
var_dump(headers_list());

OUTPUT :

array(1) {
  [0]=>
  string(23) "X-Powered-By: PHP/5.4.3"
}

array(0) {
}

The headers are now empty. 标头现在为空。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM