简体   繁体   English

网站更改代码后,Webscraper 抛出错误

[英]Webscraper throwing an error after website changed their code

I built a webscraper for realtor.com as i am looking for houses and agents in my area this has made it tons easier for me, however they just changed the code on their website (probably to stop people from doing this) and now i am getting an attribute error.我为 realtor.com 构建了一个 webscraper,因为我正在寻找我所在地区的房屋和代理商,这对我来说很容易,但是他们只是更改了他们网站上的代码(可能是为了阻止人们这样做),现在我是得到一个属性错误。 The error I'm receiving is this:我收到的错误是这样的:

File "webscraper.py", line 22, in name.getText().strip(), AttributeError: 'NoneType' object has no attribute 'getText'文件“webscraper.py”,第 22 行,在 name.getText().strip() 中,AttributeError: 'NoneType' 对象没有属性 'getText'

Code below was working perfectly collecting names and numbers before they changed the code.下面的代码在他们更改代码之前完美地收集了名称和数字。 It appears all they did was change the class names.看来他们所做的只是更改了类名。 adding the "jsx-1792441256"添加“jsx-1792441256”

import csv
import requests
from bs4 import BeautifulSoup
from time import sleep
from random import randint

sleep(randint(10,20))


realtor_data = []

for page in range(1, 10):
    print(f"Scraping page {page}...")
    url = f"https://www.realtor.com/realestateagents/san-diego_ca/pg-{page}"
    soup = BeautifulSoup(requests.get(url).text, "html.parser")

    for agent_card in soup.find_all("div", {"class": "jsx-1792441256 agent-list-card-title-text clearfix"}):
        name = agent_card.find("div", {"class": "jsx-1792441256 agent-name text-bold"}).find("a")
        number = agent_card.find("div", {"itemprop": "telephone"})
        realtor_data.append(
            [
                name.getText().strip(),
                number.getText().strip() if number is not None else "N/A"
                
             ],
        )

with open("sandiego.csv", "w") as output:
    w = csv.writer(output)
    w.writerow(["NAME:", "PHONE NUMBER:", "CITY:"])
    w.writerows(realtor_data)

import pandas as pd
a=pd.read_csv("sandiego.csv")
a2 = a.iloc[:,[0,1]]
a3 = a.iloc[:,[2]]
a3 = a3.fillna("San Diego")
b=pd.concat([a2,a3],axis=1)
b.to_csv("sandiego.csv")

Fixed code:固定代码:

import csv
import requests
from bs4 import BeautifulSoup
from time import sleep
from random import randint

# sleep(randint(10,20))


realtor_data = []

for page in range(1, 10):
    print(f"Scraping page {page}...")
    url = f"https://www.realtor.com/realestateagents/san-diego_ca/pg-{page}"
    soup = BeautifulSoup(requests.get(url).text, "html.parser")

    for agent_card in soup.select("div.agent-list-card-title.mobile-only"):
        name = agent_card.find("div", {"class": "agent-name"})
        number = agent_card.find("div", {"class": "agent-phone"})
        realtor_data.append(
            [
                name.getText().strip(),
                number.getText().strip() if number is not None else "N/A"                
            ],
        )

with open("data.csv", "w") as output:
    w = csv.writer(output)
    w.writerow(["NAME:", "PHONE NUMBER:", "CITY:"])
    w.writerows(realtor_data)

import pandas as pd
a=pd.read_csv("data.csv")
a2 = a.iloc[:,[0,1]]
a3 = a.iloc[:,[2]]
a3 = a3.fillna("San Diego")
b=pd.concat([a2,a3],axis=1)
b.to_csv("data.csv")

Creates data.csv :创建data.csv

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM