简体   繁体   English

使用漂亮的汤选择 div 标签内的链接

[英]Selecting links within a div tag using beautiful soup

I am trying to run the following code我正在尝试运行以下代码

           headers = {
                'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 
                  (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
                     }

           params = {
                'q': 'Machine learning,
                'hl': 'en'
                    }
           html = requests.get('https://scholar.google.com/scholar', headers=headers, 
           params=params).text
           soup = BeautifulSoup(html, 'lxml')
           for result in soup.select('.gs_r.gs_or.gs_scl'):
             profiles=result.select('.gs_a a')['href']

The following output (error) is being shown "TypeError: list indices must be integers or slices, not str" What is it I am doing wrong?显示以下 output(错误)“TypeError:列表索引必须是整数或切片,而不是 str”我做错了什么?

The following is tested and works:以下内容经过测试并有效:

import requests
from bs4 import BeautifulSoup as bs

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

params = {
    'q': 'Machine learning',
    'hl': 'en'
        }
html = requests.get('https://scholar.google.com/scholar', headers=headers, 
params=params).text
soup = bs(html, 'lxml')
for result in soup.select('.gs_r.gs_or.gs_scl'):
    profiles=result.select('.gs_a a')
    for p in profiles:
        print(p.get('href'))

Result in terminal:终端结果:

/citations?user=rSVIHasAAAAJ&hl=en&oi=sra
/citations?user=MnfzuPYAAAAJ&hl=en&oi=sra
/citations?user=09kJn28AAAAJ&hl=en&oi=sra
/citations?user=yxUduqMAAAAJ&hl=en&oi=sra
/citations?user=MnfzuPYAAAAJ&hl=en&oi=sra
/citations?user=9Vdfc2sAAAAJ&hl=en&oi=sra
/citations?user=lXYKgiYAAAAJ&hl=en&oi=sra
/citations?user=xzss3t0AAAAJ&hl=en&oi=sra
/citations?user=BFdcm_gAAAAJ&hl=en&oi=sra
/citations?user=okf5bmQAAAAJ&hl=en&oi=sra
/citations?user=09kJn28AAAAJ&hl=en&oi=sra

In your code, you were trying to obtain the href attribute from a list ( soup.select returns a list, and soup.select_one return a single element).在您的代码中,您试图从列表中获取href属性( soup.select返回一个列表,而soup.select_one返回一个元素)。

See BeautifulSoup documentation here在此处查看 BeautifulSoup 文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM