简体   繁体   中英

Turning an AWS server into a proxy server to be used to crawl with Scrapy

I was just wondering if anyone knows how I could configure an Amazon Web Services server to be used by a Scrapy crawler as a proxy server? I don't want to get blacklisted by the websites I am crawling so I need to use proxy servers. I just am not sure how to turn the AWS server into a proxy server. Thank you!!

The easiest way to proxy your HTTP traffic through an EC2 instance, although not as safe as using TOR or an anonymous vpn , is to use tinyproxy . You can find a walkthrough here .

Note that scraping in such a way as to violate a website's terms of use or otherwise impact the functionality of their site can be a legal liability if you intentionally violate those terms as per Trespass to chattels .

请记住,您要为流量付费,并且在同一IP重复请求过多之后,该IP将被禁止。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM