Python Beautiful Soup 我想进入标签元素内部

Question

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import csv

headers = {"Accept-Language": "es-ES,es;q=0.9",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"}
url = 'https://mydramalist.com/search?adv=people&na=3&so=popular&page=1'

above link i want to scrape from..........................................................................................................................................................我想从上面的链接中删除................................................ ..................................................... ..................................................... …………

while True:
        print(url)
        response = requests.get(url, headers=headers)
        # print(response.status_code)
        soup = BeautifulSoup(response.content, 'html.parser')
        footer = soup.select_one('li.page-item.nb.active')
        print(footer.text.strip())
        for tags in soup.find_all('h6'):
            print(tags)
            # tags = soup.select_one('h6>a') <<<<<<<<<<< This part i want to go inside of h6 element click it and get data from there
        next_page = soup.select_one('li.page-item.next>a')
        if next_page:
            next_url = next_page.get('href')
            url = urljoin(url, next_url)
        else:
            break


Hi Guys, I want to extract data from current page, going to clickable page which is the h6 tag. and loop again. I cannot figure out how can I solve the issue with for loops. please help thank you. i already updated the code

Answer 1

From the url you provided, taking the first as an example,从您提供的网址，以第一个为例，

Notice there /people/232-lee-min-ho is a sublink.注意/people/232-lee-min-ho是一个子链接。

All you got to do is scrape the sublink and add it to the main link as shown below,您所要做的就是抓取子链接并将其添加到主链接，如下所示，

new_link = https://mydramalist.com + sublink new_link = https://mydramalist.com + 子链接

it should give you the full link https://mydramalist.com/people/232-lee-min-ho它应该给你完整的链接https://mydramalist.com/people/232-lee-min-ho

Now perform another requests.get(new_link) on your new link to retrieve the contents.现在在您的新链接上执行另一个requests.get(new_link)以检索内容。

Python Beautiful Soup 我想进入标签元素内部

问题描述

1 个解决方案

解决方案1
0 2022-12-22 07:00:49

Python Beautiful Soup 我想进入标签元素内部

问题描述

1 个解决方案

解决方案1 0 2022-12-22 07:00:49

解决方案1
0 2022-12-22 07:00:49