简体   繁体   中英

Trying to scrape Airbnb data

So I'm trying to scrape some data from Airbnb (name, price, rating), I can print out variables such as price,name and rating but I want to put them in a dictionary. What am I missing?

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'
}

url = 'https://www.airbnb.com/s/Tbilisi--Georgia/homes?tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&flexible_trip_dates%5B%5D=november&flexible_trip_dates%5B%5D=october&flexible_trip_lengths%5B%5D=weekend_trip&date_picker_type=calendar&query=Tbilisi%2C%20Georgia&place_id=ChIJa2JP5tcMREARo25X4u2E0GE&source=structured_search_input_header&search_type=autocomplete_click'

response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.content, 'lxml')



for item in soup.find_all('div', itemprop='itemListElement'):

    try:
        price = item.find('span', class_='_krjbj').text
        rating = item.find('span', class_='_18khxk1').text
        name = item.find('meta', itemprop='name')['content']
    except Exception as e:
        house_list = {
            'price': price,
            'rating': rating,
            'name': name,
        }
        print(house_list)

The way you've written it, you'll only print the house_dict dictionary if you run into an exception in the try block (which wouldn't work anyway - hitting an exception inside the try block means that one of the variables you're trying to put inside house_dict won't be defined, which will raise a NameError in the except block).

You probably want to do something like this instead:

# ...
    try:
        price = item.find('span', class_='_krjbj').text
        rating = item.find('span', class_='_18khxk1').text
        name = item.find('meta', itemprop='name')['content']
    except Exception as e:
        print("Ran into an Exception when trying to parse data")
        continue
    else:
        house_list = {
            'price': price,
            'rating': rating,
            'name': name,
        }
        print(house_list)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM