讀取文件，提取網址並重新編寫-Python

Question

我正在讀取以下format(a.txt)文本文件format(a.txt) 。

http://www.example.com/forum/showthread.php?t=779689/images/webcard.jpg 121.10.208.31

然后，我只需要獲取帶有/images/webcard.jpg 121.10.208.31的www.example.com部分，並將其寫入同一文件或單獨的文件。 在這種情況下，我會將其寫入b.txt 。

from urlparse import urlparse 
f = open('a.txt','r')
fo = open('b','w')


for line in f:
    fo.write(urlparse(line).netloc+ ' ' + line.split(' ')[1] + ' ' + line.split(' ')[2] + '\n')

上面的代碼給出以下錯誤？如何實現？

    Traceback (most recent call last):
  File "prittyprint.py", line 17, in <module>
    fo.write(urlparse(line).netloc+ ' ' + line.split(' ')[1] + ' ' + line.split(' ')[2] + '\n')
IndexError: list index out of range

Answer 1

您的文件a.txt可能存在異常。 某些行可能沒有這種格式。 你可以試試這個-

from urlparse import urlparse 

f = open('a.txt','r')
fo = open('b','w')

for line in f:
    split_line = line.split(' ')
    if len(split_line) >=3:
        fo.write(urlparse(line).netloc+ ' ' + split_line[1] + ' ' + split_line[2] + '\n')
    else:
        print "ERROR: some other line: %s" % (line) #continue on with next line

讀取文件，提取網址並重新編寫-Python

問題描述

1 個解決方案

解決方案1
3 2013-06-16 06:22:39

讀取文件，提取網址並重新編寫-Python

問題描述

1 個解決方案

解決方案1 3 2013-06-16 06:22:39

解決方案1
3 2013-06-16 06:22:39