[英]Not able to get all tags/text scraping a website using Python Beautifulsoup
[英]Scraping website and collect all the hyperlinks using python
我正在制作一個可以從任何網站獲取信息的程序。 但該程序不起作用。
示例——網站是 naukri.com,我們必須收集頁面的所有超鏈接:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
isc = ssl.create_default_context()
isc.check_hostname = False
isc.verify_mode = ssl.CERT_NONE
open = urllib.request.urlopen('https://www.naukri.com/job-listings-Python-
Developer-Cloud-Analogy-Softech-Pvt-Ltd-Noida-Sector-63-Noida-1-to-2-years-250718003152src=jobsearchDesk&sid=15325422374871&xp=1&px=1&qp=python%20developer
&srcPage=s', context = isc).read()
soup = BeautifulSoup(open, 'html.parser')
tags = soup('a')
for tag in tags:
print(tag.get('href', None))
我會使用請求和 bs4。 我能夠讓它發揮作用,我認為它具有預期的結果。 嘗試這個:
import requests
from bs4 import BeautifulSoup
url = ('https://www.naukri.com/job-listings-Python-Developer-Cloud-Analogy-Softech-Pvt-Ltd-Noida-Sector-63-Noida-1-to-2-years-250718003152src=jobsearchDesk&sid=15325422374871&xp=1&px=1&qp=python%20developer&srcPage=s')
response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, 'html.parser')
links = soup.find_all('a', href=True)
for each in links:
print(each.get('href'))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.