[英]A better method than readlines?
使用Python 2.5,我正在讀取HTML文件,以獲取三種不同的信息。 我能夠找到信息的方式是找到與正則表達式 *的匹配項,然后從匹配行開始向下計數特定的行數,以獲取我正在尋找的實際信息。 問題是我必須重新打開該站點3次(對於我正在查找的每條信息,均應打開一次)。 我認為這效率低下,希望只查找一次打開站點的所有三件事。 有誰有更好的方法或建議?
* 我將學習更好的方法,例如BeautifulSoup,但就目前而言,我需要快速修復
碼:
def scrubdividata(ticker):
try:
f = urllib2.urlopen('http://dividata.com/stock/%s'%(ticker))
lines = f.readlines()
for i in range(0,len(lines)):
line = lines[i]
if "Annual Dividend:" in line:
s = str(lines[i+1])
start = '>\$'
end = '</td>'
AnnualDiv = re.search('%s(.*)%s' % (start, end), s).group(1)
f = urllib2.urlopen('http://dividata.com/stock/%s'%(ticker))
lines = f.readlines()
for i in range(0,len(lines)):
line = lines[i]
if "Last Dividend:" in line:
s = str(lines[i+1])
start = '>\$'
end = '</td>'
LastDiv = re.search('%s(.*)%s' % (start, end), s).group(1)
f = urllib2.urlopen('http://dividata.com/stock/%s'%(ticker))
lines = f.readlines()
for i in range(0,len(lines)):
line = lines[i]
if "Last Ex-Dividend Date:" in line:
s = str(lines[i+1])
start = '>'
end = '</td>'
LastExDivDate = re.search('%s(.*)%s' % (start, end), s).group(1)
divlist.append((ticker,LastDiv,AnnualDiv,LastExDivDate))
except:
if ticker not in errorlist:
errorlist.append(ticker)
else:
pass
pass
謝謝,
乙
我找到了可行的解決方案! 我刪除了兩個無關的urlopen和readlines命令,只為循環留了一個(在我只刪除urlopen命令之前,但保留了readlines)。 這是我的更正代碼:
def scrubdividata(ticker):
try:
f = urllib2.urlopen('http://dividata.com/stock/%s'%(ticker))
lines = f.readlines()
for i in range(0,len(lines)):
line = lines[i]
if "Annual Dividend:" in line:
s = str(lines[i+1])
start = '>\$'
end = '</td>'
AnnualDiv = re.search('%s(.*)%s' % (start, end), s).group(1)
#f = urllib2.urlopen('http://dividata.com/stock/%s'%(ticker))
#lines = f.readlines()
for i in range(0,len(lines)):
line = lines[i]
if "Last Dividend:" in line:
s = str(lines[i+1])
start = '>\$'
end = '</td>'
LastDiv = re.search('%s(.*)%s' % (start, end), s).group(1)
#f = urllib2.urlopen('http://dividata.com/stock/%s'%(ticker))
#lines = f.readlines()
for i in range(0,len(lines)):
line = lines[i]
if "Last Ex-Dividend Date:" in line:
s = str(lines[i+1])
start = '>'
end = '</td>'
LastExDivDate = re.search('%s(.*)%s' % (start, end), s).group(1)
divlist.append((ticker,LastDiv,AnnualDiv,LastExDivDate))
print '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'
print ticker,LastDiv,AnnualDiv,LastExDivDate
print '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'
except:
if ticker not in errorlist:
errorlist.append(ticker)
else:
pass
pass
BeautifulSoup示例供參考(內存中的Python2:在這里我僅將其用於Python3,因此某些語法可能會有所偏離):
from BeautifulSoup import BeautifulSoup
from urllib2 import urlopen
yoursite = "http://...."
with urlopen(yoursite) as f:
soup = BeautifulSoup(f)
for node in soup.findAll('td', attrs={'class':'descrip'}):
print node.text
print node.next_sibling.next_sibling.text
輸出(對於示例輸入“ GOOG”):
Last Close:
$910.68
Annual Dividend:
N/A
Pay Date:
N/A
Dividend Yield:
N/A
Ex-Dividend Date:
N/A
Years Paying:
N/A
52 Week Dividend:
$0.00
etc.
在具有可預測模式的網站上,BeautifulSoup易於使用。
def scrubdividata(ticker):
try:
end = '</td>'
f = urllib2.urlopen('http://dividata.com/stock/%s'%(ticker))
lines = f.readlines()
for i in range(0,len(lines)):
line = lines[i]
if "Annual Dividend:" in line:
s = str(lines[i+1])
start = '>\$'
AnnualDiv = re.search('%s(.*)%s' % (start, end), s).group(1)
if "Last Dividend:" in line:
s = str(lines[i+1])
start = '>\$'
LastDiv = re.search('%s(.*)%s' % (start, end), s).group(1)
if "Last Ex-Dividend Date:" in line:
s = str(lines[i+1])
start = '>'
LastExDivDate = re.search('%s(.*)%s' % (start, end), s).group(1)
divlist.append((ticker,LastDiv,AnnualDiv,LastExDivDate))
except:
if ticker not in errorlist:
errorlist.append(ticker)
else:
pass
pass
需要注意的是lines
會包含你所需要的線,所以沒有必要調用f.readlines()
一次。 只需重復使用lines
小提示:您可以for line in lines
使用for line in lines
:
def scrubdividata(ticker):
try:
f = urllib2.urlopen('http://dividata.com/stock/%s'%(ticker))
lines = f.readlines()
for line in lines:
if "Annual Dividend:" in line:
s = str(lines[i+1])
start = '>\$'
end = '</td>'
AnnualDiv = re.search('%s(.*)%s' % (start, end), s).group(1)
for line in lines:
if "Last Dividend:" in line:
s = str(lines[i+1])
start = '>\$'
end = '</td>'
LastDiv = re.search('%s(.*)%s' % (start, end), s).group(1)
for line in lines:
if "Last Ex-Dividend Date:" in line:
s = str(lines[i+1])
start = '>'
end = '</td>'
LastExDivDate = re.search('%s(.*)%s' % (start, end), s).group(1)
divlist.append((ticker,LastDiv,AnnualDiv,LastExDivDate))
except:
if ticker not in errorlist:
errorlist.append(ticker)
else:
pass
pass
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.