如何用python和beautifulsoup解析html表并写入csv

Question

I try to parse html page and fetch values for currencies and write to csv. 我尝试解析html页面并获取货币值并写入csv。 I have following code: 我有以下代码：

#!/usr/bin/env python

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://www.bank.gov.ua/control/en/curmetal/detail/currency?period=daily"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div', attrs={'class': 'content'})

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True) + ';'
        print text,
    print

The problem is, that I do not know, how to retrieve only values for currency. 问题是，我不知道，如何只检索货币的值。 I tried some regexp like '^[0-9]{3}' - start with 3 digits but it doesn't work. 我尝试了一些像'^ [0-9] {3}'这样的正则表达式 - 以3位开头，但它不起作用。

Answer 1

You'd be much better off picking out specific cells in the table. 你最好选择表格中的特定细胞。 The td cells with the cell_c class contain data you are interested in, and the last one is always the currency exchange rate: 具有cell_c类的td单元格包含您感兴趣的数据，最后一个单元格始终是货币汇率：

rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    if 'cell_c' in cols[0]['class']:
        # currency row
        digital_code, letter_code, units, name, rate = [c.text for c in cols]
        print digital_code, letter_code, units, name, rate

With the data in separate variables, you can now turn the text to decimal numbers, store them in a database, whatever. 使用单独变量中的数据，您现在可以将文本转换为十进制数，将它们存储在数据库中，无论如何。

如何用python和beautifulsoup解析html表并写入csv

问题描述

1 个解决方案

解决方案1
9 已采纳 2013-03-06 14:59:18

如何用python和beautifulsoup解析html表并写入csv

问题描述

1 个解决方案

解决方案1 9 已采纳 2013-03-06 14:59:18

解决方案1
9 已采纳 2013-03-06 14:59:18