python：使用代理IP进行scrapy

Question

我想使用代理IP使用scrapy进行网络抓取。 为了使用代理，我按照文档中的说明设置了环境变量http_proxy 。

$ export http_proxy=http://proxy:port

为了测试IP的变化是否有效，我创建了一个名为test的新蜘蛛：

from scrapy.spider import BaseSpider
from scrapy.contrib.spiders import CrawlSpider, Rule

class TestSpider(CrawlSpider):
    name = "test"
    domain_name = "whatismyip.com"
    start_urls = ["http://whatismyip.com"]

    def parse(self, response):
        print response.body
        open('check_ip.html', 'wb').write(response.body)

但是如果我运行这个蜘蛛， check_ip.html不会显示环境变量中指定的IP，而是显示爬网前的原始IP。

问题是什么？ 有没有其他方法可以检查我是否使用代理IP？ 或者有没有其他方法来使用代理IP？

Answer 1

编辑当前项目中的settings.py并确保已启用HttpProxyMiddleware：

DOWNLOADER_MIDDLEWARES = { 
 #you need this line in order to scrap through a proxy/proxy list
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
}

python：使用代理IP进行scrapy

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-05-13 11:59:16

python：使用代理IP进行scrapy

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-05-13 11:59:16

解决方案1
2 已采纳 2014-05-13 11:59:16