[英]Using Python to append a list of names to a single URL during a crawl of a URL
我正在寻找一个脚本来搜寻我的URL,并在搜索过程中在URL的末尾附加一个名称。 例如192.168.1.100/map/ foo
然后,我想解析响应,如果状态码为200,内容长度为83。我想将其输出到文本文件。 如果这两个条件都不匹配,那么我将跳过URL的打印。
这是我的主意。 我正在寻找一些一般方向。
我将从URL开始,我将读取数组或列表,具体取决于参数长度。 然后,我将解析响应并查找条件。 如果为true,则将URL写入文本文档,否则将继续循环。
有什么想法吗? 我并不是要您编写我的代码只是为了指出我的一般方向。
谢谢
这是基于@German Petrov代码的最新版本。
import requests
import urlparse
url = "http://192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
"register","test","users"]
with open("/root/Desktop/urls.txt", 'a') as urls:
for name in names:
newUrl = url + name
r = requests.get(newUrl)
c = r.content
if r.status_code == 200 and (("Unknown" in c) <>1):
urls.write(url + '\n')
哇!!! 我的超级愚蠢错误。 如果将http://添加到url,则有帮助。...也无需执行urlparse.urljoin,因为它会切断我的完整url。
这就是为什么我喜欢我的超级简单C#ide :-D
对于以下内容,您必须安装requests
: http : //docs.python-requests.org/en/latest/user/install/#install
第一
import requests
import urlparse
baseurl= "192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
"register","test","users"]
for name in names:
print urlparse.urljoin(baseurl, name)
提供以下输出:
192.168.1.2/map/admin
192.168.1.2/map/backup
192.168.1.2/map/contact
192.168.1.2/map/index
192.168.1.2/map/logs
192.168.1.2/map/news
192.168.1.2/map/reboot
192.168.1.2/map/register
192.168.1.2/map/test
192.168.1.2/map/users
然后,您可以使用get调用更新代码:
import requests
import urlparse
baseurl = "192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
"register","test","users"]
with open("C:\\urls.txt", 'a') as urls:
for name in names:
url = urlparse.urljoin(baseurl, name)
r = requests.get(url)
if r.status_code == 200 and int(r.headers['content-length']) > 73:
urls.write(url) #Write it to the file
感谢德国人的帮助,它使我明白了这一点。
import requests
import urlparse
url = "http://192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
"register","test","users"]
with open("/root/Desktop/urls.txt", 'a') as urls:
for name in names:
newUrl = url + name
r = requests.get(newUrl)
c = r.content
if r.status_code == 200 and (("Unknown" in c) <>1):
urls.write(url + '\n')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.