使用 Beautiful Soup 的數據采集器

Question

我正在嘗試使用漂亮的湯從該網站上抓取表格，但一直收到消息 dataframe 為空。

 import requests
 import pandas as pd
 from bs4 import BeautifulSoup

 URL = "https://www.kiplinger.com/article/real-estate/t010-c000-s002-home-price- 
 changes-in-the-100-largest-metro-areas.html"
 page = requests.get(URL)

 soup = BeautifulSoup(page.content, 'html.parser')
 table = soup.find('table')

 data = []
 for tr in table.find_all('tr'):
 row = {}
 cells = tr.find_all('td')
  if len(cells) == 3:
    row['Metro Area'] = cells[0].text.strip()
    row['Median Home Price'] = cells[1].text.strip()
    row['Affordability Index'] = cells[2].text.strip()
    data.append(row)


 df = pd.DataFrame(data)
 print(df)

我的目標是包含 3 列的 dataframe。 “都會區”、“房價中位數”和“負擔能力指數”。

Answer 1

有必要刪除第 16 行的條件if ，因為該站點中的表格有六列（觀察表格有六個td標記），並且如果單元格等於三，您已經條件化將值添加到行變量，所以這個條件永遠不會實現。

如果您刪除整行if並適當地重新對齊代碼，您將獲得所請求的解決方案。

使用 Beautiful Soup 的數據采集器

問題描述

1 個解決方案

解決方案1
0 2023-01-09 03:38:10

使用 Beautiful Soup 的數據采集器

問題描述

1 個解決方案

解決方案1 0 2023-01-09 03:38:10

解決方案1
0 2023-01-09 03:38:10