简体   繁体   中英

Python html parsing using beautiful soup issues

I am trying to get the name of all organizations from https://www.devex.com/organizations/search using beautifulsoup.However, I am getting an error. Can someone please help.

import requests from requests import get from bs4 import BeautifulSoup import pandas as pd import numpy as np

from time import sleep from random import randint

headers = {"Accept-Language": "en-US,en;q=0.5"}

titles = [] pages = np.arange(1, 2, 1)

for page in pages:

page = requests.get("https://www.devex.com/organizations/search?page%5Bnumber%5D=" + str(page) + "", headers=headers)

soup = BeautifulSoup(page.text, 'html.parser') movie_div = soup.find_all('div', class_='info-container')

sleep(randint(2,10))

for container in movie_div:

    name = container.a.find('h3', class_= 'ng-binding').text
    titles.append(name)
    

movies = pd.DataFrame({ 'movie': titles,

})

to see your dataframe

print(movies)

to see the datatypes of your columns

print(movies.dtypes)

to see where you're missing data and how much data is missing

print(movies.isnull().sum())

to move all your scraped data to a CSV file

movies.to_csv('movies.csv')

you may try with something like

name = bs.find("h3", {"class": "ng-binding"})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM