繁体   English   中英

在抓取URL期间使用Python将名称列表附加到单个URL

[英]Using Python to append a list of names to a single URL during a crawl of a URL

我正在寻找一个脚本来搜寻我的URL,并在搜索过程中在URL的末尾附加一个名称。 例如192.168.1.100/map/ foo

然后,我想解析响应,如果状态码为200,内容长度为83。我想将其输出到文本文件。 如果这两个条件都不匹配,那么我将跳过URL的打印。

这是我的主意。 我正在寻找一些一般方向。

我将从URL开始,我将读取数组或列表,具体取决于参数长度。 然后,我将解析响应并查找条件。 如果为true,则将URL写入文本文档,否则将继续循环。

有什么想法吗? 我并不是要您编写我的代码只是为了指出我的一般方向。

谢谢

这是基于@German Petrov代码的最新版本。

import requests
import urlparse

url = "http://192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
              "register","test","users"]

with open("/root/Desktop/urls.txt", 'a') as urls:
    for name in names:
       newUrl = url + name
       r = requests.get(newUrl)
       c = r.content
       if r.status_code == 200 and (("Unknown" in c) <>1):
            urls.write(url + '\n')

哇!!! 我的超级愚蠢错误。 如果将http://添加到url,则有帮助。...也无需执行urlparse.urljoin,因为它会切断我的完整url。

这就是为什么我喜欢我的超级简单C#ide :-D

对于以下内容,您必须安装requestshttp : //docs.python-requests.org/en/latest/user/install/#install

第一

import requests
import urlparse

baseurl= "192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
              "register","test","users"]

for name in names:
    print urlparse.urljoin(baseurl, name)

提供以下输出:

192.168.1.2/map/admin
192.168.1.2/map/backup
192.168.1.2/map/contact
192.168.1.2/map/index
192.168.1.2/map/logs
192.168.1.2/map/news
192.168.1.2/map/reboot
192.168.1.2/map/register
192.168.1.2/map/test
192.168.1.2/map/users

然后,您可以使用get调用更新代码:

import requests
import urlparse

baseurl = "192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
              "register","test","users"]

with open("C:\\urls.txt", 'a') as urls:    
    for name in names:
        url = urlparse.urljoin(baseurl, name)
        r = requests.get(url)
        if r.status_code == 200 and int(r.headers['content-length']) > 73: 
           urls.write(url) #Write it to the file

感谢德国人的帮助,它使我明白了这一点。

import requests
import urlparse

url = "http://192.168.1.2/map/ShowPage.ashx?="
names = ["admin","backup","contact","index","logs","news","reboot",
              "register","test","users"]

with open("/root/Desktop/urls.txt", 'a') as urls:
    for name in names:
       newUrl = url + name
       r = requests.get(newUrl)
       c = r.content
       if r.status_code == 200 and (("Unknown" in c) <>1):
            urls.write(url + '\n')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM