I am actually using a proxy to scrape data from some sites but the problem is sometimes some proy url returns nothing and programmed stopped after a few tries, I need some logic to overcome this issue so that even if IP does not respond program should renew the IP and try to open the page again, I am using TOR as a proxy in python.
Here is my website opening code:
mainPage = requests.get("http://proxy_IP/?link=http://example.com/")
mainTree = html.fromstring(mainPage.text)
You can simply put your code in while loop and give it certain condition, when that condition becomes TRUE, it means your page is properly opened.
mainPage = requests.get("http://proxy_IP/?link=http://example.com/")
mainTree = html.fromstring(mainPage.text)
mainTree
while (mainTree.xpath('boolean(some_xpath_to_be_true])') != True):
mainPage = requests.get("http://proxy_IP/?link=http://example.com/")
mainTree = html.fromstring(mainPage.text)
Now your mainTree contains the page source correctly.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.