简体   繁体   中英

How to only scrape the first item in a row using Beautiful Soup

I am currently running the following python script:

import requests
from bs4 import BeautifulSoup

origin= ["USD","GBP","EUR"]
i=0
while i < len(origin):
page = requests.get("https://www.x-rates.com/table/?from="+origin[i]+"&amount=1")
soup = BeautifulSoup(page.content, "html.parser")

tables = soup.findChildren('table')
my_table = tables[0]

rows = my_table.findChildren(['td'])

i = i +1


for rows in rows:
    cells = rows.findChildren('a')
    for cell in cells:
        value = cell.string
        print(value)

To scrape data from this HTML:

https://i.stack.imgur.com/DkX83.png

The problem I have is that I'm struggling to only scrape the first column without scraping the second one as well because they are both under tags and in the same table row as each other. The href is the only thing which differentiates between the two tags and I have tried filtering using this but it doesn't seem to work and returns a blank value. Also when i try to sort the data manually the output is amended vertically and not horizontally, I am new to coding so any help would be appreciated :)

It is easier to follow what happens when you print every item you got from the top eg in this case from table item. The idea is to go one by one so you can follow.

import requests
from bs4 import BeautifulSoup

origin= ["USD","GBP","EUR"]
i=0
while i < len(origin):
    page = requests.get("https://www.x-rates.com/table/?from="+origin[i]+"&amount=1")
    soup = BeautifulSoup(page.content, "html.parser")
    tables = soup.findChildren('table')
    my_table = tables[0]

    i = i +1

    rows = my_table.findChildren('tr')
    for row in rows:
        cells = row.findAll('td',class_='rtRates')
        if len(cells) > 0:
            first_item = cells[0].find('a')
            value = first_item.string
            print(value)

There is another way you might wanna try as well to achieve the same:

import requests
from bs4 import BeautifulSoup

keywords = ["USD","GBP","EUR"]

for keyword in keywords:
    page = requests.get("https://www.x-rates.com/table/?from={}&amount=1".format(keyword))
    soup = BeautifulSoup(page.content, "html.parser")
    for items in soup.select_one(".ratesTable tbody").find_all("tr"):
        data = [item.text for item in items.find_all("td")[1:2]]
        print(data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM