简体   繁体   English

Python Beautiful Soup 我想进入标签元素内部

[英]Python Beautiful Soup I Want to Go Inside of A Tag Element

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import csv

headers = {"Accept-Language": "es-ES,es;q=0.9",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"}
url = 'https://mydramalist.com/search?adv=people&na=3&so=popular&page=1'

above link i want to scrape from..........................................................................................................................................................我想从上面的链接中删除................................................ ..................................................... ..................................................... …………

while True:
        print(url)
        response = requests.get(url, headers=headers)
        # print(response.status_code)
        soup = BeautifulSoup(response.content, 'html.parser')
        footer = soup.select_one('li.page-item.nb.active')
        print(footer.text.strip())
        for tags in soup.find_all('h6'):
            print(tags)
            # tags = soup.select_one('h6>a') <<<<<<<<<<< This part i want to go inside of h6 element click it and get data from there
        next_page = soup.select_one('li.page-item.next>a')
        if next_page:
            next_url = next_page.get('href')
            url = urljoin(url, next_url)
        else:
            break


Hi Guys, I want to extract data from current page, going to clickable page which is the h6 tag. and loop again. I cannot figure out how can I solve the issue with for loops. please help thank you. i already updated the code

From the url you provided, taking the first as an example,从您提供的网址,以第一个为例,

在此处输入图像描述

Notice there /people/232-lee-min-ho is a sublink.注意/people/232-lee-min-ho是一个子链接。

All you got to do is scrape the sublink and add it to the main link as shown below,您所要做的就是抓取子链接并将其添加到主链接,如下所示,

new_link = https://mydramalist.com + sublink new_link = https://mydramalist.com + 子链接

it should give you the full link https://mydramalist.com/people/232-lee-min-ho它应该给你完整的链接https://mydramalist.com/people/232-lee-min-ho

Now perform another requests.get(new_link) on your new link to retrieve the contents.现在在您的新链接上执行另一个requests.get(new_link)以检索内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM