简体   繁体   中英

Data Scraper using Beautiful Soup

I am attempting to scrape the table from this website using beautiful soup however keep getting message dataframe empty.

 import requests
 import pandas as pd
 from bs4 import BeautifulSoup

 URL = "https://www.kiplinger.com/article/real-estate/t010-c000-s002-home-price- 
 changes-in-the-100-largest-metro-areas.html"
 page = requests.get(URL)

 soup = BeautifulSoup(page.content, 'html.parser')
 table = soup.find('table')

 data = []
 for tr in table.find_all('tr'):
 row = {}
 cells = tr.find_all('td')
  if len(cells) == 3:
    row['Metro Area'] = cells[0].text.strip()
    row['Median Home Price'] = cells[1].text.strip()
    row['Affordability Index'] = cells[2].text.strip()
    data.append(row)


 df = pd.DataFrame(data)
 print(df)

Im aiming for a dataframe that contain 3 columns. 'Metro Area', 'Median Home Price' and 'Affordability Index'.

Its necessary to remove the conditional if on the 16th line, because the table in this site have six columns (observe the table have six td tags), and you have conditioned to add values to the row variable if cells was equal to three, so this conditional never will be achieved.

If you remove the entire line of the if and proper realign the code, you will obtain the requested solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM