简体   繁体   中英

Can VM / Machine IP be used instead of Proxy Server for Scrapy

I have a Scrapy crawler and I want to rotate the IP so my application will not be blocked. I am setting IP in scrapy using request.meta['proxy'] = 'http://51.161.82.60:80' but this is a VM's IP. My question is can VM or Machine's IP be used for scrapy or I need a proxy server?

Currently I am doing this. This does not throw any error but when I get response from http://checkip.dyndns.org it is my own IP not updated IP which I set in meta. That is why I want to know if I do need proxy server.

Definitely you need a proxy server. meta data is only a field in the http request. the server side still knows the public ip that really connecting from the tcp connection layer.

The reason you are getting your own IP is because your VM is 'transparent'. You will need to intercept your request at the VM, remove tracking headers such as X-Forwarded-For, and your server has to know who to respond to when it receives the response from the website you are crawling.

The simplest solution though, is to install a proxy service on your VM, for example Squid , then set forwarded_for off to make it an anonymous proxy server. There may be other request options to tweak to make it truly anonymous. Remember to secure the whitelisted IP addresses with http_access allow specialIP and acl specialIP src xxxx in /etc/squid/squid.conf . The default port of Squid is 3128.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM