从 html 标签中提取字符串

Question

I want to get a string from div data-pair-id which is "14958"我想从 div data-pair-id 得到一个字符串，它是“14958”

This is my code:这是我的代码：

urlheader = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"
}

url = "https://www.investing.com/indices/nasdaq-composite"
req = requests.get(url, headers=urlheader)
soup = BeautifulSoup(req.content, "lxml")
x = soup.find('div', id="data-pair-id")

But x comes up blank.但是 x 出现空白。

What's wrong with my code?我的代码有什么问题？

Answer 1

import requests
from bs4 import BeautifulSoup
import re

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0'
}


def main(url):
    r = requests.get(url, headers=headers)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = soup.find("div", {'data-pair-id': True}).get('data-pair-id')
    match = re.search(r'smlID = (.*?);', r.text).group(1)
    print(target)
    print(match)


main("https://www.investing.com/indices/nasdaq-composite")

Output: Output：

14958
2035293

Answer 2

On the given page, there are only two places data-pair-id was found.在给定的页面上，只有两个地方data-pair-id被发现。 For both divs, it was not the div's id but an attribute of the div whose value was 14958 .对于这两个 div，它不是 div 的 id，而是 div 的属性，其值为14958 。

So, given data-pair-id , you can find the attribute's value by finding the first div, passing another parameter specifying the attribute it must have.因此，给定data-pair-id ，您可以通过找到第一个 div 来找到属性的值，并传递另一个参数来指定它必须具有的属性。

divs = soup.find('div', {"data-pair-id": True})
print(divs.get('data-pair-id'))

See: https://stackoverflow.com/a/39055066/11890300请参阅： https://stackoverflow.com/a/39055066/11890300

从 html 标签中提取字符串

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-05-23 11:32:29

解决方案2
0 2020-05-23 11:24:35

从 html 标签中提取字符串

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-05-23 11:32:29

解决方案2 0 2020-05-23 11:24:35

解决方案1
1 已采纳 2020-05-23 11:32:29

解决方案2
0 2020-05-23 11:24:35