简体   繁体   中英

Web scraping soup.findAll always return empty list

I try to scrape web page using python and BeautifulSoup. When I write:

table = soup.find('table')

it returns None.

and when I try get row content, it always returns empty list. I also used Selenium and the same result empty list.

import requests
from bs4 import BeautifulSoup
import csv
url = "https://www.iea.org/data-and-statistics/data-tables?country=CANADA&energy=Balances&year=2010"
response = requests.get(url)
print(response.status_code) >>> print 200
soup = BeautifulSoup(response.text,"html.parser")
tr = soup.findAll('tr', attrs={'class': 'm-data-table__row '})
print(tr) >>> print []
print(len(tr)) >>> print 0
csvFile = open("C:/Users/User/Desktop/test27.csv",'wt',newline='', encoding='utf-8')
writer = csv.writer(csvFile)  
try:   
    for cell in tr:
        td = cell.find_all('td')
        row = [i.text.replace('\n','') for i in td]
        writer.writerow(row)       
finally:   
    csvFile.close()

Any help?

When you analyse the website, the data is loaded via ajax call. The following script makes the ajax call and saves the required json to a file

import requests, json
from bs4 import BeautifulSoup

res = requests.get("https://api.iea.org/stats/?year=2010&countries=CANADA&series=BALANCES")

data = res.json()

with open("data.json", "w") as f:
    json.dump(data,f)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM