简体   繁体   English

Pandas 仅将最后一个 bs4 元素打印到 csv 文件

[英]Pandas only printing last bs4 element to csv file

I'm scraping house data from zoopla.co.uk我正在从zoopla.co.uk抓取房屋数据

The dataframe seems to be printing correctly, but pandas is printing only the last element (the last house) to the csv file. dataframe 似乎打印正确,但 pandas 仅将最后一个元素(最后一个房子)打印到 csv 文件。

I also tried casting each object as a list in the pd.DataFrame({}) statement but that did not change the csv output.我还尝试将每个 object 转换为 pd.DataFrame({}) 语句中的列表,但这并没有改变 csv output。

Code代码

import requests
from bs4 import BeautifulSoup
import re
import pandas as pd

my_url = 'https://www.zoopla.co.uk/for-sale/property/b23/?page_size=100&q=B23&radius=0&results_sort=newest_listings&search_source=refine'
res = requests.get(my_url)
soup = BeautifulSoup(res.text, "html.parser")
lis = soup.find("ul", class_="listing-results clearfix js-gtm-list").find_all("li", class_="srp clearfix")

for li in lis:
    bedrooms = li.find("span", class_="num-beds")
    bathrooms = li.find("span", class_="num-baths")

    price = li.find("a", class_="text-price")
    house_price = re.findall('\£(\d+)', str(price))

    style = li.find("h2", class_="listing-results-attr")
    house_type = re.findall('(?<=bed ).*(?= for)', str(style))

    distance = li.find("li", class_="clearfix")
    station_distance = re.findall('\d+\.?\d*', str(distance))

    if bedrooms:
        bedrooms = bedrooms.get_text(strip=True)
    if bathrooms:
        bathrooms = bathrooms.get_text(strip=True)
    if house_price:
        house_price = house_price
    if house_type:
        house_type = house_type
    if station_distance:
        station_distance = station_distance

    df = pd.DataFrame({'house_price': house_price, 'house_type': house_type, 'station_distance': station_distance, 'bedrooms': bedrooms, 'bathrooms': bathrooms})
    print(df)

    df.to_csv('zoopla.csv')

Output Output

house_price house_type station_distance bedrooms bathrooms
0          90       flat              0.2        1         1
  house_price      house_type station_distance bedrooms bathrooms
0         210  detached house              0.6        3      None
  house_price         house_type station_distance bedrooms bathrooms
0         160  end terrace house              0.7        2         1
  house_price      house_type station_distance bedrooms bathrooms
0         325  detached house              1.2        4         1
  house_price           house_type station_distance bedrooms bathrooms
0         195  semi-detached house              1.1        3         1
  house_price      house_type station_distance bedrooms bathrooms
0          24  terraced house              0.9        3      None
  house_price house_type station_distance bedrooms bathrooms
0         115       flat              0.2        2         1

Excel Output - pandas only outputs the last element (house) from web site Excel Output - pandas 仅从 Z2567A5EC9705EB7AC2DZ98403 输出最后一个元素(房屋) 在此处输入图像描述

You are over-riding the dataframe with each iteration.每次迭代都会覆盖 dataframe。

Use:利用:

result = []
for li in lis:
    ...

    result.append({'house_price': house_price, 'house_type': house_type, 'station_distance': station_distance, 'bedrooms': bedrooms, 'bathrooms': bathrooms})
    
df = pd.DataFrame(result)
print(df)

df.to_csv('zoopla.csv')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM