[英]Python html parsing using beautiful soup issues
I am trying to get the name of all organizations from https://www.devex.com/organizations/search using beautifulsoup.However, I am getting an error.我正在尝试使用 beautifulsoup 从https://www.devex.com/organizations/search获取所有组织的名称。但是,我遇到了错误。 Can someone please help.
有人可以帮忙吗。
import requests from requests import get from bs4 import BeautifulSoup import pandas as pd import numpy as np从请求中导入请求 从 bs4 中导入请求 导入 BeautifulSoup 导入 pandas 作为 pd 导入 numpy 作为 np
from time import sleep from random import randint从时间导入睡眠从随机导入randint
headers = {"Accept-Language": "en-US,en;q=0.5"} headers = {"Accept-Language": "en-US,en;q=0.5"}
titles = [] pages = np.arange(1, 2, 1)标题 = [] 页 = np.arange(1, 2, 1)
for page in pages:对于页面中的页面:
page = requests.get("https://www.devex.com/organizations/search?page%5Bnumber%5D=" + str(page) + "", headers=headers) page = requests.get("https://www.devex.com/organizations/search?page%5Bnumber%5D=" + str(page) + "", headers=headers)
soup = BeautifulSoup(page.text, 'html.parser') movie_div = soup.find_all('div', class_='info-container') soup = BeautifulSoup(page.text, 'html.parser') movie_div = soup.find_all('div', class_='info-container')
sleep(randint(2,10))睡眠(randint(2,10))
for container in movie_div:对于movie_div中的容器:
name = container.a.find('h3', class_= 'ng-binding').text
titles.append(name)
movies = pd.DataFrame({ 'movie': titles,电影 = pd.DataFrame({ '电影': 标题,
}) })
print(movies)印刷(电影)
print(movies.dtypes)打印(movies.dtypes)
print(movies.isnull().sum())打印(movies.isnull().sum())
movies.to_csv('movies.csv') movies.to_csv('movies.csv')
you may try with something like你可以试试类似的东西
name = bs.find("h3", {"class": "ng-binding"})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.