[英]Python - web crawling / different result from same code? / requests, bs4 / M1
I learning python for web crawling, but i'm totally stuck.我学习 python 用于 web 爬行,但我完全卡住了。
Each time I run this codes, results change.每次我运行此代码时,结果都会发生变化。
very rarely, it works but almost return empty list.很少,它可以工作,但几乎返回空列表。
why does it happen?为什么会这样? please let me know
请告诉我
from indeed import extract_indeed_pages, extract_indeed_jobs
last_indeed_page = extract_indeed_pages()
print(last_indeed_page)
indeed_jobs = extract_indeed_jobs(last_indeed_page)
print(indeed_jobs)
import requests
from bs4 import BeautifulSoup
LIMIT = 50
URL = f"https://kr.indeed.com/jobs?q=React&l=%EC%84%9C%EC%9A%B8&radius=100&jt=fulltime&limit={LIMIT}"
def extract_indeed_pages():
result = requests.get(URL)
soup = BeautifulSoup(result.text, "html.parser")
pagination = soup.find("div", {"class": "pagination"})
links = pagination.find_all('a')
pages = []
for link in links[:-1]:
pages.append(int(link.string))
max_page = pages[-1]
return max_page
def extract_indeed_jobs(last_page):
jobs = []
result = requests.get(f"{URL}&start={0*LIMIT}")
soup = BeautifulSoup(result.text, "html.parser")
results = soup.find_all("h2", {"class": "jobTitle"})
jobs.append(results)
return jobs
This happens because of the javascript on the source code.发生这种情况是因为源代码上的 javascript。 You can view the web page by pressing the
ctrl + u
buttons on your pc.您可以通过按电脑上的
ctrl + u
按钮查看 web 页面。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.