[英]Reading Text File From Webpage by Python3
import re
import urllib
hand=urllib.request.urlopen("http://www.pythonlearn.com/code/mbox-short.txt")
qq=hand.read().decode('utf-8')
numlist=[]
for line in qq:
line.rstrip()
stuff=re.findall("^X-DSPAM-Confidence: ([0-9.]+)",line)
if len(stuff)!=1:
continue
num=float(stuff[0])
numlist.append(num)
print('Maximum:',max(numlist))
The variable qq
contains all the strings from the text file. 变量
qq
包含文本文件中的所有字符串。 However, the for
loop doesn't work and numlist
is still empty. 但是,
for
循环不起作用, numlist
仍为空。
When I download the text file as a local file then read it, everything is ok. 当我将文本文件作为本地文件下载然后阅读它,一切正常。
Use the regex on qq using the multiline flag re.M
, you are iterating over a string so going character by character , not line by line so you are calling findall on single characters: 使用多行标志
re.M
在qq上使用正则表达式,你正在迭代一个字符串,所以逐个字符 ,而不是逐行,所以你在单个字符上调用findall:
In [18]: re.findall("^X-DSPAM-Confidence: ([0-9.]+)",qq, re.M)
Out [18]: ['0.8475', '0.6178', '0.6961', '0.7565', '0.7626', '0.7556', '0.7002', '0.7615', '0.7601', '0.7605', '0.6959', '0.7606', '0.7559', '0.7605', '0.6932', '0.7558', '0.6526', '0.6948', '0.6528', '0.7002', '0.7554', '0.6956', '0.6959', '0.7556', '0.9846', '0.8509', '0.9907']
What you are doing is equivalnet to: 你在做什么是等同于:
In [13]: s = "foo\nbar"
In [14]: for c in s:
....: stuff=re.findall("^X-DSPAM-Confidence: ([0-9.]+)",c)
print(c)
....:
f
o
o
b
a
r
If you want floats, you can cast with map
: 如果你想要花车,你可以使用
map
进行投射:
list(map(float,re.findall("^X-DSPAM-Confidence: ([0-9.]+)",qq, re.M)))
But if you just want the max, you can pass a key to max
: 但如果您只想要最大值,则可以将密钥传递给
max
:
In [22]: max(re.findall("^X-DSPAM-Confidence: ([0-9.]+)",qq, re.M),key=float)
Out[22]: '0.9907'
So all you need is three lines: 所以你需要的只是三行:
In [28]: hand=urllib.request.urlopen("http://www.pythonlearn.com/code/mbox-short.txt")
In [29]: qq = hand.read().decode('utf-8')
In [30]: max(re.findall("^X-DSPAM-Confidence: ([0-9.]+)",qq, re.M),key=float)
Out[30]: '0.9907'
If you wanted to go line by line, iterate directly over hand
: 如果你想通过走行线,直接遍历
hand
:
import re
import urllib
hand = urllib.request.urlopen("http://www.pythonlearn.com/code/mbox-short.txt")
numlist = []
# iterate over each line like a file object
for line in hand:
stuff = re.search("^X-DSPAM-Confidence: ([0-9.]+)", line.decode("utf-8"))
if stuff:
numlist.append(float(stuff.group(1)))
print('Maximum:', max(numlist))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.