简体   繁体   中英

Proxy Scrapper from Python 2.x to Python 3.x conversion

I was trying to convert a very simple function on Python 2 to Python 3 that would scrap a web page and return a list of proxys so I could use on a Twitter robot:

#!/usr/bin/env python
#python25 on windows7
#####################################
# GPL v2
# Author: Arjun Sreedharan
# Email: arjun024@gmail.com
#####################################

import urllib2
import re
import os
import time
import random

def main():
    request = urllib2.Request("http://www.ip-adress.com/proxy_list/")
    # request.add_header("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5")
    #Without Referer header ip-adress.com gives 403 Forbidden
    request.add_header("Referer","https://www.google.co.in/")
    f = urllib2.urlopen(request)

    #outfile = open('outfile.htm','w')
    str1 = f.read()
    #outfile.write(str1)

    # normally DOT matches anycharacter EXCEPT newline. re.DOTALL makes dot 
    include newline
    pattern = re.compile('.*<td>(.*)</td>.*<td>Elite</td>.*', re.DOTALL)
    matched = re.search(pattern,str1)
    print(matched.group(1))
    """
    ip = matched.group(1)
    os.system('echo "http_proxy=http://'+ip+'" > ~/.wgetrc')
    if random.randint(1,2)==1:
        os.system('wget --proxy=on -t 1 --timeout=14 --header="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5" http://funnytweets.in -O /dev/null')
    else:
        os.system('wget --proxy=on -t 1 --timeout=14 --header="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13" http://funnytweets.in -O /dev/null')
    """
 if __name__ == '__main__':
    while True:
        main()
        time.sleep(2)

Ok, I already know that the urllib2 is diferent on P3 but i could not make it work :( Anyone can help? :) thanks!

In Python3 Request and urlopen are located in the urllib.request module, so hou have to change your imports accordingly.

from urllib.request import Request, urlopen

You could make your code Python2 and Python3 compatible if you catch ImportError exceptions when importing from urllib2 .

try : 
    from urllib2 import Request, urlopen
except ImportError: 
    from urllib.request import Request, urlopen

Also keep in mind that URLError and HTTPError are located in urllib.error , if you need them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM