使用Python解析HTML頁面時遇到問題

Question

我正在嘗試在代碼為“ SEVNYXX”的列下保存數據，其中“ XX”是網站http：//www.federalreserve后面的數字（例如01、02等）。 gov / econresdata / researchdata / feds200628_1.html使用Python。 我目前正在使用網站http://docs.python-guide.org/en/latest/scenarios/scrape/規定的以下方法。 但是，我不知道如何確定此頁面的div ，因此無法繼續進行，並希望對此有所幫助。

這是我到目前為止的內容：

from lxml import html
import requests
page = requests.get('http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html')
tree = html.fromstring(page.text)

謝謝

Answer 1

您是否嘗試過使用BeautifulSoup ？ 我是個超級粉絲。 使用它，您可以輕松地遍歷所需的所有信息，並按標簽進行搜索。

我將這些內容匯總在一起，可以打印出您要查看的每一列中的值。 不確定要對數據做什么，但希望對您有所幫助。

from bs4 import BeautifulSoup
from urllib import request

page = request.urlopen('http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html').read()
soup = BeautifulSoup(page)

desired_table = soup.findAll('table')[2]

# Find the columns you want data from
headers = desired_table.findAll('th')
desired_columns = []
for th in headers:
    if 'SVENY' in th.string:
        desired_columns.append(headers.index(th))

# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')

for row in rows[1:]:
    cells= row.findAll('td')
    for column in desired_columns:
        print(cells[column].text)

回應您的第二個要求：

from bs4 import BeautifulSoup
from urllib import request

page = request.urlopen('http://www.federalreserve.gov/econresdata/researchdata/feds200628_1.html').read()
soup = BeautifulSoup(page)

desired_table = soup.findAll('table')[2]
data = {}

# Find the columns you want data from
headers = desired_table.findAll('th')
desired_columns = []
column_count = 0
for th in headers:
    if 'SVENY' in th.string:
        data[th.string] = {'column': headers.index(th), 'data': []}
        column_count += 1

# Iterate through each row grabbing the data from the desired columns
rows = desired_table.findAll('tr')

for row in rows[1:]:
    date = row.findAll('th')[0].text
    cells= row.findAll('td')

    for header,info in data.items():
        column_number = info['column']
        cell_data = [date,cells[column_number].text]
        info['data'].append(cell_data)

這將返回一個字典，其中每個鍵是一列的標題，每個值是另一個字典，該字典在列表中具有1）它在網站上的列，以及2）所需的實際數據。

舉個例子：

for year_number in data['SVENY01']['data']:
    print(year_number)

['2015-06-05', '0.3487']
['2015-06-04', '0.3124']
['2015-06-03', '0.3238']
['2015-06-02', '0.3040']
['2015-06-01', '0.3009']
['2015-05-29', '0.2957']
etc.

您可以在此隨意擺弄以獲得所需的方式和信息，但希望這會有所幫助。

使用Python解析HTML頁面時遇到問題

問題描述

1 個解決方案

解決方案1
1 已采納 2015-06-09 18:34:23

使用Python解析HTML頁面時遇到問題

問題描述

1 個解決方案

解決方案1 1 已采納 2015-06-09 18:34:23

解決方案1
1 已采納 2015-06-09 18:34:23