简体   繁体   English

Python | 通过IP地址检查网站是否存在

[英]Python | Check if website exists by IP address

I have a few millions IPv4 addresses in a .txt file, like so: 我在.txt文件中有数百万个 IPv4地址,如下所示:

xyzw XYZW

xyzw XYZW

xyzw XYZW

... ...

My goal is to check for each address, if there's a real website behind it or the address is fake. 我的目标是检查每个地址,如果后面有一个真实的网站或该地址是假的。

I've seen posts only dealing with URLs (not addresses), and indeed I tried the methods described to reverse-DNS the IP address first to URL, and then use it to determine whether the website exists or not. 我见过的帖子仅涉及URL(而不是地址),的确,我尝试了上述方法,先将IP地址反向DNS映射为URL,然后使用它来确定网站是否存在。 However, it takes about 2 seconds for each address, which means a few months for all of them, and of course I don't have that time. 但是,每个地址大约需要2秒钟,对于所有地址来说都意味着几个月,当然我没有时间。

What's the best, fastest way to do it? 最好,最快的方法是什么?

I highly prefer Python, but could using C speed things up significantly? 我非常喜欢Python,但是使用C可以大大提高速度吗?

Thanks. 谢谢。

Unless these websites are virtually hosted , IP addresses are not any different from hostnames. 除非虚拟托管这些网站,否则IP地址与主机名没有任何不同。 But in case of virtual hosting, using a reverse-DNS won't help you as many sites could be hosted on the same IP address, and the one you'll query might be down at the moment. 但是,在虚拟主机的情况下,使用反向DNS不能为您提供帮助,因为可以将多个站点托管在相同的IP地址上,并且您现在要查询的站点可能已关闭。 Also, not all websites will be registered in the reverse DNS records, so you might miss some. 此外,并非所有网站都会在反向DNS记录中注册,因此您可能会错过一些网站。

I don't know what method you are using to determine if a website is hosted at an address, but whatever it is, doing it is probably IO bound and not CPU bound. 我不知道您使用什么方法来确定是否将网站托管在某个地址,但是无论如何,执行此操作可能是受IO限制而不是CPU限制。 That means that using C will probably yield insignificant improvement in performance, as the program will spend most of the time waiting for response from the websites anyway. 这意味着使用C可能会在性能上带来微不足道的改善,因为该程序将花费大部分时间等待网站的响应。

What you can do to improve performance is: 您可以提高性能的方法是:

  1. Decrease timeouts. 减少超时。 If you are using the default timeouts for network operations, you might find yourself waiting for responses more than you want. 如果将默认超时用于网络操作,则可能会发现自己等待响应的时间超出了您的期望。

  2. Parallelize tasks. 并行化任务。 Try using the threading or asyncio modules. 尝试使用threadingasyncio模块。 They are built to allow parallelization of tasks, and asyncio is specifically meant to do so for IO bound programs. 它们被构建为允许任务并行化,而asyncio专门用于IO绑定程序。

Also, consider using tools that already have these features implemented, like nmap for example. 另外,考虑使用已经实现了这些功能的工具,例如nmap

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM