[英]I'm trying to create a crawler for my wordpress website using python3
import requests
from bs4 import BeautifulSoup
def page(current_page):
current = "h2"
while current == current_page:
url = 'https://vishrantkhanna.com/?s=' + str(current)
source_code = requests.get(url)
plain_text = source_code.txt
soup = BeautifulSoup(plain_text)
for link in soup.findAll('h2', {'class': 'entry-title'}):
href = "https://vishrantkhanna.com/" + link.get('href')
title = link.string
print(href)
print(title)
page("h2")
I'm trying to copy and print the article title and the href link associated with it. 我正在尝试复制和打印文章标题以及与其关联的href链接。
You need to extract the <a>
tag from the heading: 您需要从标题中提取
<a>
标记:
import requests
from bs4 import BeautifulSoup
URL = 'https://vishrantkhanna.com/?s=1'
html = requests.get(URL).text
bs = BeautifulSoup(html, 'html.parser')
for link in bs.find_all('h2', {'class': 'entry-title'}):
a = link.find('a', href=True)
href = "https://vishrantkhanna.com/" + a.get('href')
title = link.string
print(href)
print(title)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.