[英]Extracting a string from a html tag
I want to get a string from div data-pair-id which is "14958"我想从 div data-pair-id 得到一个字符串,它是“14958”
This is my code:这是我的代码:
urlheader = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
url = "https://www.investing.com/indices/nasdaq-composite"
req = requests.get(url, headers=urlheader)
soup = BeautifulSoup(req.content, "lxml")
x = soup.find('div', id="data-pair-id")
But x comes up blank.但是 x 出现空白。
What's wrong with my code?我的代码有什么问题?
import requests
from bs4 import BeautifulSoup
import re
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:76.0) Gecko/20100101 Firefox/76.0'
}
def main(url):
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
target = soup.find("div", {'data-pair-id': True}).get('data-pair-id')
match = re.search(r'smlID = (.*?);', r.text).group(1)
print(target)
print(match)
main("https://www.investing.com/indices/nasdaq-composite")
Output: Output:
14958
2035293
On the given page, there are only two places data-pair-id
was found.在给定的页面上,只有两个地方
data-pair-id
被发现。 For both divs, it was not the div's id but an attribute of the div whose value was 14958
.对于这两个 div,它不是 div 的 id,而是 div 的属性,其值为
14958
。
So, given data-pair-id
, you can find the attribute's value by finding the first div, passing another parameter specifying the attribute it must have.因此,给定
data-pair-id
,您可以通过找到第一个 div 来找到属性的值,并传递另一个参数来指定它必须具有的属性。
divs = soup.find('div', {"data-pair-id": True})
print(divs.get('data-pair-id'))
See: https://stackoverflow.com/a/39055066/11890300请参阅: https://stackoverflow.com/a/39055066/11890300
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.