简体   繁体   English

如何仅使用Beautiful Soup刮擦连续的第一项

[英]How to only scrape the first item in a row using Beautiful Soup

I am currently running the following python script: 我目前正在运行以下python脚本:

import requests
from bs4 import BeautifulSoup

origin= ["USD","GBP","EUR"]
i=0
while i < len(origin):
page = requests.get("https://www.x-rates.com/table/?from="+origin[i]+"&amount=1")
soup = BeautifulSoup(page.content, "html.parser")

tables = soup.findChildren('table')
my_table = tables[0]

rows = my_table.findChildren(['td'])

i = i +1


for rows in rows:
    cells = rows.findChildren('a')
    for cell in cells:
        value = cell.string
        print(value)

To scrape data from this HTML: 要从此HTML抓取数据:

https://i.stack.imgur.com/DkX83.png https://i.stack.imgur.com/DkX83.png

The problem I have is that I'm struggling to only scrape the first column without scraping the second one as well because they are both under tags and in the same table row as each other. 我遇到的问题是,我只在刮擦第一列而不在刮擦第二列,因为它们都在标签下并且彼此在同一行中。 The href is the only thing which differentiates between the two tags and I have tried filtering using this but it doesn't seem to work and returns a blank value. href是唯一可以区分这两个标签的东西,我尝试使用此标签进行过滤,但它似乎无法正常工作并返回空白值。 Also when i try to sort the data manually the output is amended vertically and not horizontally, I am new to coding so any help would be appreciated :) 另外,当我尝试手动对数据进行排序时,输出在垂直方向而不是水平方向上进行了修改,因此我是编码的新手,所以可以提供任何帮助:)

It is easier to follow what happens when you print every item you got from the top eg in this case from table item. 当您打印从顶部获得的每个项目时,例如在这种情况下,从表格项目中打印时,将更容易理解发生的情况。 The idea is to go one by one so you can follow. 这个想法是一个一个地走,以便您可以跟随。

import requests
from bs4 import BeautifulSoup

origin= ["USD","GBP","EUR"]
i=0
while i < len(origin):
    page = requests.get("https://www.x-rates.com/table/?from="+origin[i]+"&amount=1")
    soup = BeautifulSoup(page.content, "html.parser")
    tables = soup.findChildren('table')
    my_table = tables[0]

    i = i +1

    rows = my_table.findChildren('tr')
    for row in rows:
        cells = row.findAll('td',class_='rtRates')
        if len(cells) > 0:
            first_item = cells[0].find('a')
            value = first_item.string
            print(value)

There is another way you might wanna try as well to achieve the same: 您可能还想尝试另一种方法来实现相同目的:

import requests
from bs4 import BeautifulSoup

keywords = ["USD","GBP","EUR"]

for keyword in keywords:
    page = requests.get("https://www.x-rates.com/table/?from={}&amount=1".format(keyword))
    soup = BeautifulSoup(page.content, "html.parser")
    for items in soup.select_one(".ratesTable tbody").find_all("tr"):
        data = [item.text for item in items.find_all("td")[1:2]]
        print(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM