I was trying to convert a very simple function on Python 2 to Python 3 that would scrap a web page and return a list of proxys so I could use on a Twitter robot:
#!/usr/bin/env python
#python25 on windows7
#####################################
# GPL v2
# Author: Arjun Sreedharan
# Email: arjun024@gmail.com
#####################################
import urllib2
import re
import os
import time
import random
def main():
request = urllib2.Request("http://www.ip-adress.com/proxy_list/")
# request.add_header("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5")
#Without Referer header ip-adress.com gives 403 Forbidden
request.add_header("Referer","https://www.google.co.in/")
f = urllib2.urlopen(request)
#outfile = open('outfile.htm','w')
str1 = f.read()
#outfile.write(str1)
# normally DOT matches anycharacter EXCEPT newline. re.DOTALL makes dot
include newline
pattern = re.compile('.*<td>(.*)</td>.*<td>Elite</td>.*', re.DOTALL)
matched = re.search(pattern,str1)
print(matched.group(1))
"""
ip = matched.group(1)
os.system('echo "http_proxy=http://'+ip+'" > ~/.wgetrc')
if random.randint(1,2)==1:
os.system('wget --proxy=on -t 1 --timeout=14 --header="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5" http://funnytweets.in -O /dev/null')
else:
os.system('wget --proxy=on -t 1 --timeout=14 --header="User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.29 Safari/525.13" http://funnytweets.in -O /dev/null')
"""
if __name__ == '__main__':
while True:
main()
time.sleep(2)
Ok, I already know that the urllib2 is diferent on P3 but i could not make it work :( Anyone can help? :) thanks!
In Python3 Request
and urlopen
are located in the urllib.request
module, so hou have to change your imports accordingly.
from urllib.request import Request, urlopen
You could make your code Python2 and Python3 compatible if you catch ImportError
exceptions when importing from urllib2
.
try :
from urllib2 import Request, urlopen
except ImportError:
from urllib.request import Request, urlopen
Also keep in mind that URLError
and HTTPError
are located in urllib.error
, if you need them.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.