通过Python3从网页读取文本文件

Question

import re
import urllib
hand=urllib.request.urlopen("http://www.pythonlearn.com/code/mbox-short.txt")
qq=hand.read().decode('utf-8') 
numlist=[]
for line in qq:
    line.rstrip()
    stuff=re.findall("^X-DSPAM-Confidence: ([0-9.]+)",line)
    if len(stuff)!=1:
        continue
    num=float(stuff[0])
    numlist.append(num)
print('Maximum:',max(numlist))

The variable qq contains all the strings from the text file. 变量qq包含文本文件中的所有字符串。 However, the for loop doesn't work and numlist is still empty. 但是， for循环不起作用， numlist仍为空。

When I download the text file as a local file then read it, everything is ok. 当我将文本文件作为本地文件下载然后阅读它，一切正常。

Answer 1

Use the regex on qq using the multiline flag re.M , you are iterating over a string so going character by character , not line by line so you are calling findall on single characters: 使用多行标志re.M在qq上使用正则表达式，你正在迭代一个字符串，所以逐个字符 ，而不是逐行，所以你在单个字符上调用findall：

In [18]: re.findall("^X-DSPAM-Confidence: ([0-9.]+)",qq, re.M)
Out [18]: ['0.8475', '0.6178', '0.6961', '0.7565', '0.7626', '0.7556', '0.7002', '0.7615', '0.7601', '0.7605', '0.6959', '0.7606', '0.7559', '0.7605', '0.6932', '0.7558', '0.6526', '0.6948', '0.6528', '0.7002', '0.7554', '0.6956', '0.6959', '0.7556', '0.9846', '0.8509', '0.9907']

What you are doing is equivalnet to: 你在做什么是等同于：

In [13]: s = "foo\nbar"

In [14]: for c in s:
   ....:    stuff=re.findall("^X-DSPAM-Confidence: ([0-9.]+)",c)
            print(c)
   ....:     
f
o
o


b
a
r

If you want floats, you can cast with map : 如果你想要花车，你可以使用map进行投射：

list(map(float,re.findall("^X-DSPAM-Confidence: ([0-9.]+)",qq, re.M)))

But if you just want the max, you can pass a key to max : 但如果您只想要最大值，则可以将密钥传递给max ：

In [22]: max(re.findall("^X-DSPAM-Confidence: ([0-9.]+)",qq, re.M),key=float)
Out[22]: '0.9907'

So all you need is three lines: 所以你需要的只是三行：

In [28]: hand=urllib.request.urlopen("http://www.pythonlearn.com/code/mbox-short.txt")

In [29]: qq = hand.read().decode('utf-8')

In [30]: max(re.findall("^X-DSPAM-Confidence: ([0-9.]+)",qq, re.M),key=float)
Out[30]: '0.9907'

If you wanted to go line by line, iterate directly over hand : 如果你想通过走行线，直接遍历hand ：

import re
import urllib

hand = urllib.request.urlopen("http://www.pythonlearn.com/code/mbox-short.txt")
numlist = []
# iterate over each line like a file object
for line in hand:
    stuff = re.search("^X-DSPAM-Confidence: ([0-9.]+)", line.decode("utf-8"))
    if stuff:
        numlist.append(float(stuff.group(1)))
print('Maximum:', max(numlist))

通过Python3从网页读取文本文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-01-31 16:41:04

通过Python3从网页读取文本文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-01-31 16:41:04

解决方案1
1 已采纳 2016-01-31 16:41:04