[英]Parsing All Pages of HTML using BeautifulSoup
I'm having problems within my code which works perfectly with one page, but when I try to parse all the 28 pages it doesn't parse 27 pages, but parse only the first one. 我的代码存在问题,无法完美地与一个页面配合使用,但是当我尝试解析所有28个页面时,它无法解析27个页面,而只能解析第一个页面。
The main idea is parse the data from the mentioned url which has 28 pages in overall and I made for loop for it in order to make BS parse from all the pages. 主要思想是解析来自提到的url的数据,该url总共有28个页面,我为此进行了循环,以便从所有页面进行BS解析。 However, it parses only the first page, but doesn't parse others.
但是,它仅解析首页,而不解析其他页面。
I would like to get your recommendations and ways to make it work. 我想得到您的建议和使它起作用的方法。
Code: 码:
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
for t in range(28):
url = "https://boss.az/vacancies?action=index&controller=vacancies&only_path=true&page={}&type=vacancies".format(t)
r = requests.get(url)
soup = bs(r.content, 'html.parser')
titles = [i.text for i in soup.select('.results-i-title')]
#print(titles)
companies = [i.text for i in soup.select('.results-i-company')]
#print(companies)
summaries = [i.text for i in soup.select('.results-i-summary')]
df = pd.DataFrame(list(zip(titles, companies, summaries)), columns = ['Title', 'Company', 'Summary'])
df.to_csv(r'Data.csv', sep=',', encoding='utf-8-sig',index = False )
You are overwriting titles
, companies
and summaries
with every iteration of the loop. 您将在循环的每次迭代中覆盖
titles
, companies
和summaries
。 Simply change titles = ...
to titles += ...
: 只需将
titles = ...
更改为titles += ...
:
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
titles = []
companies = []
summaries = []
for t in range(28):
url = "https://boss.az/vacancies?action=index&controller=vacancies&only_path=true&page={}&type=vacancies".format(t)
r = requests.get(url)
soup = bs(r.content, 'html.parser')
titles += [i.text for i in soup.select('.results-i-title')]
companies += [i.text for i in soup.select('.results-i-company')]
summaries += [i.text for i in soup.select('.results-i-summary')]
df = pd.DataFrame(list(zip(titles, companies, summaries)), columns = ['Title', 'Company', 'Summary'])
df.to_csv(r'Data.csv', sep=',', encoding='utf-8-sig',index = False )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.