简体   繁体   English

在python中进行网络抓取时在HTML中找到正确的标签

[英]Locating the right tag in HTML while webscraping in python

Im working on a project for school were I display the current price for bitcoin, eth and maybe another and im web scraping https://cryptowat.ch/ but I cant find the tag used to store the live price. 我正在为一个学校的项目工作,我显示了比特币,ETH的当前价格,也许还显示了当前价格,并且我在网上抓取了https://cryptowat.ch/,但我找不到用于存储实时价格的标签。 when i parse the div tag it returns the price but im not able to isolate it so i can display it in python 当我解析div标签时,它返回价格,但是我无法隔离它,因此我可以在python中显示它

<div class="rankings-col__header__segment"><h2>BTC</h2><weak>usd </weak>10857.00</div>

From what I understand - you know the BTC string and can use it to base your locator. 据我了解-您知道BTC字符串,可以使用它作为您的定位器的基础。

So, if it would be XPath, you can use that and following-sibling::text() : 因此,如果它将是XPath,则可以使用它和following-sibling::text()

//h2[. = 'BTC']/following-sibling::text()

Example using lxml.html : 使用lxml.html示例:

from lxml.html import fromstring

data = """<div class="rankings-col__header__segment"><h2>BTC</h2><weak>usd </weak>10857.00</div>"""

root = fromstring(data)
print(root.xpath("//h2[. = 'BTC']/following-sibling::text()"))

Prints ['10857.00'] . 打印['10857.00']


If, by any chance, you use BeautifulSoup , it would be: 如果您有机会使用BeautifulSoup ,它将是:

from bs4 import BeautifulSoup


data = """<div class="rankings-col__header__segment"><h2>BTC</h2><weak>usd </weak>10857.00</div>"""

soup = BeautifulSoup(data, "html.parser")
print(soup.find("h2", string="BTC").find_next_sibling(text=True))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 定位正确的python时出现问题 - Issues locating the right python 网页抓取时跳过具有不同 HTML 标签的元素 - Jumping over elements that have not the same HTML tag while webscraping 删除<br>使用 selenium 和 python 进行网络抓取时标记正确的 alignment - Removing <br> tag for proper alignment while webscraping using selenium and python 用 Python 抓取 HTML - Webscraping HTML with Python 定位特定 <p> 标记之后 <h1> Python Html Parser中的标签 - Locating specific <p> tag after <h1> tag in Python Html Parser WebScraping和python:用html渲染javascript吗? - WebScraping & python: Rendering javascript in html? Python Webscraping:需要帮助从 span html 标签获取数据值 - Python Webscraping: Need help acquiring data-value from span html tag 使用 python beautifulsoup 进行网页抓取,在 HTML 中找不到表格标签,以及如何抓取包含许多页面的表格 - Can't find table tag in HTML from webscraping with python beautifulsoup and how to scrape a table with many pages Python 为公司地址抓取彭博网站 - 在从 ZE6B391A8D2C4D45902A23A8B6585703D 获取 html 内容时获取“你是机器人”验证码 - Python Webscraping bloomberg site for company addresses - getting 'Are you a robot' captcha while fetching the html content from URL 使用 selenium 进行网络抓取 Python 时出现循环 - For Loops while using selenium for webscraping Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM