简体   繁体   English

我试图从网页中使用正则表达式python获取代理

[英]im trying to get proxies using regex python out of a web page

import urllib.request
import re
page = urllib.request.urlopen("http://www.samair.ru/proxy/ip-address-01.htm").read()
re.findall('\d+\.\d+\.\d+\.\d+', page)

i dont understand why it says: 我不明白为什么它说:

File "C:\\Python33\\lib\\re.py", line 201, in findall return _compile(pattern, flags).findall(string) TypeError: can't use a string pattern on a bytes-like object 文件“C:\\ Python33 \\ lib \\ re.py”,第201行,在findall中返回_compile(pattern,flags).findall(string)TypeError:不能在类字节对象上使用字符串模式

import urllib
import re
page = urllib.urlopen("http://www.samair.ru/proxy/ip-address-01.htm").read()
print re.findall('\d+\.\d+\.\d+\.\d+', page)

Worked and gave me the result: 工作并给了我结果:

['056.249.66.50', '100.44.124.8', '103.31.250.115', ...

Edit 编辑

  • This works for python2.7 这适用于python2.7

The result of reading the file-like object returned by urllib.request.urlopen is a bytes object. 读取urllib.request.urlopen返回的类文件对象的结果是一个bytes对象。 You can either decode it into a unicode string and use a unicode regex: 您可以将其解码为unicode字符串并使用unicode正则表达式:

>>> re.findall('\d+\.\d+\.\d+\.\d+', page.decode('utf-8'))
['056.249.66.50', '100.44.124.8', '103.31.250.115', '105.236.180.243', '105.236.21.213', '108.171.162.172', '109.207.61.143', '109.207.61.197', '109.207.61.202', '109.226.199.129', '109.232.112.109', '109.236.220.98', '110.196.42.33', '110.74.197.141', '110.77.183.64', '110.77.199.111', '110.77.200.248', '110.77.219.154', '110.77.219.2', '110.77.221.208']

... or use a bytes regex: ...或使用字节正则表达式:

>>> re.findall(b'\d+\.\d+\.\d+\.\d+', page)
[b'056.249.66.50', b'100.44.124.8', b'103.31.250.115', b'105.236.180.243', b'105.236.21.213', b'108.171.162.172', b'109.207.61.143', b'109.207.61.197', b'109.207.61.202', b'109.226.199.129', b'109.232.112.109', b'109.236.220.98', b'110.196.42.33', b'110.74.197.141', b'110.77.183.64', b'110.77.199.111', b'110.77.200.248', b'110.77.219.154', b'110.77.219.2', b'110.77.221.208']

Depending on which datatype you prefer to work with. 具体取决于您喜欢使用的数据类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM