如何僅使用Beautiful Soup刮擦連續的第一項

Question

我目前正在運行以下python腳本：

import requests
from bs4 import BeautifulSoup

origin= ["USD","GBP","EUR"]
i=0
while i < len(origin):
page = requests.get("https://www.x-rates.com/table/?from="+origin[i]+"&amount=1")
soup = BeautifulSoup(page.content, "html.parser")

tables = soup.findChildren('table')
my_table = tables[0]

rows = my_table.findChildren(['td'])

i = i +1


for rows in rows:
    cells = rows.findChildren('a')
    for cell in cells:
        value = cell.string
        print(value)

要從此HTML抓取數據：

https://i.stack.imgur.com/DkX83.png

我遇到的問題是，我只在刮擦第一列而不在刮擦第二列，因為它們都在標簽下並且彼此在同一行中。 href是唯一可以區分這兩個標簽的東西，我嘗試使用此標簽進行過濾，但它似乎無法正常工作並返回空白值。 另外，當我嘗試手動對數據進行排序時，輸出在垂直方向而不是水平方向上進行了修改，因此我是編碼的新手，所以可以提供任何幫助:)

Answer 1

當您打印從頂部獲得的每個項目時，例如在這種情況下，從表格項目中打印時，將更容易理解發生的情況。 這個想法是一個一個地走，以便您可以跟隨。

import requests
from bs4 import BeautifulSoup

origin= ["USD","GBP","EUR"]
i=0
while i < len(origin):
    page = requests.get("https://www.x-rates.com/table/?from="+origin[i]+"&amount=1")
    soup = BeautifulSoup(page.content, "html.parser")
    tables = soup.findChildren('table')
    my_table = tables[0]

    i = i +1

    rows = my_table.findChildren('tr')
    for row in rows:
        cells = row.findAll('td',class_='rtRates')
        if len(cells) > 0:
            first_item = cells[0].find('a')
            value = first_item.string
            print(value)

Answer 2

您可能還想嘗試另一種方法來實現相同目的：

import requests
from bs4 import BeautifulSoup

keywords = ["USD","GBP","EUR"]

for keyword in keywords:
    page = requests.get("https://www.x-rates.com/table/?from={}&amount=1".format(keyword))
    soup = BeautifulSoup(page.content, "html.parser")
    for items in soup.select_one(".ratesTable tbody").find_all("tr"):
        data = [item.text for item in items.find_all("td")[1:2]]
        print(data)

如何僅使用Beautiful Soup刮擦連續的第一項

問題描述

2 個解決方案

解決方案1
0 2018-06-26 14:20:25

解決方案2
0 已采納 2018-06-26 16:27:50

如何僅使用Beautiful Soup刮擦連續的第一項

問題描述

2 個解決方案

解決方案1 0 2018-06-26 14:20:25

解決方案2 0 已采納 2018-06-26 16:27:50

解決方案1
0 2018-06-26 14:20:25

解決方案2
0 已采納 2018-06-26 16:27:50