[英]Pandas only printing last bs4 element to csv file
I'm scraping house data from zoopla.co.uk我正在从zoopla.co.uk抓取房屋数据
The dataframe seems to be printing correctly, but pandas is printing only the last element (the last house) to the csv file. dataframe 似乎打印正确,但 pandas 仅将最后一个元素(最后一个房子)打印到 csv 文件。
I also tried casting each object as a list in the pd.DataFrame({}) statement but that did not change the csv output.我还尝试将每个 object 转换为 pd.DataFrame({}) 语句中的列表,但这并没有改变 csv output。
Code代码
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd
my_url = 'https://www.zoopla.co.uk/for-sale/property/b23/?page_size=100&q=B23&radius=0&results_sort=newest_listings&search_source=refine'
res = requests.get(my_url)
soup = BeautifulSoup(res.text, "html.parser")
lis = soup.find("ul", class_="listing-results clearfix js-gtm-list").find_all("li", class_="srp clearfix")
for li in lis:
bedrooms = li.find("span", class_="num-beds")
bathrooms = li.find("span", class_="num-baths")
price = li.find("a", class_="text-price")
house_price = re.findall('\£(\d+)', str(price))
style = li.find("h2", class_="listing-results-attr")
house_type = re.findall('(?<=bed ).*(?= for)', str(style))
distance = li.find("li", class_="clearfix")
station_distance = re.findall('\d+\.?\d*', str(distance))
if bedrooms:
bedrooms = bedrooms.get_text(strip=True)
if bathrooms:
bathrooms = bathrooms.get_text(strip=True)
if house_price:
house_price = house_price
if house_type:
house_type = house_type
if station_distance:
station_distance = station_distance
df = pd.DataFrame({'house_price': house_price, 'house_type': house_type, 'station_distance': station_distance, 'bedrooms': bedrooms, 'bathrooms': bathrooms})
print(df)
df.to_csv('zoopla.csv')
Output Output
house_price house_type station_distance bedrooms bathrooms
0 90 flat 0.2 1 1
house_price house_type station_distance bedrooms bathrooms
0 210 detached house 0.6 3 None
house_price house_type station_distance bedrooms bathrooms
0 160 end terrace house 0.7 2 1
house_price house_type station_distance bedrooms bathrooms
0 325 detached house 1.2 4 1
house_price house_type station_distance bedrooms bathrooms
0 195 semi-detached house 1.1 3 1
house_price house_type station_distance bedrooms bathrooms
0 24 terraced house 0.9 3 None
house_price house_type station_distance bedrooms bathrooms
0 115 flat 0.2 2 1
Excel Output - pandas only outputs the last element (house) from web site Excel Output - pandas 仅从 Z2567A5EC9705EB7AC2DZ98403 输出最后一个元素(房屋)
You are over-riding the dataframe with each iteration.每次迭代都会覆盖 dataframe。
Use:利用:
result = []
for li in lis:
...
result.append({'house_price': house_price, 'house_type': house_type, 'station_distance': station_distance, 'bedrooms': bedrooms, 'bathrooms': bathrooms})
df = pd.DataFrame(result)
print(df)
df.to_csv('zoopla.csv')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.