在python中进行网络抓取时在HTML中找到正确的标签

Question

Im working on a project for school were I display the current price for bitcoin, eth and maybe another and im web scraping https://cryptowat.ch/ but I cant find the tag used to store the live price. 我正在为一个学校的项目工作，我显示了比特币，ETH的当前价格，也许还显示了当前价格，并且我在网上抓取了https://cryptowat.ch/，但我找不到用于存储实时价格的标签。 when i parse the div tag it returns the price but im not able to isolate it so i can display it in python 当我解析div标签时，它返回价格，但是我无法隔离它，因此我可以在python中显示它

<div class="rankings-col__header__segment"><h2>BTC</h2><weak>usd </weak>10857.00</div>

Answer 1

From what I understand - you know the BTC string and can use it to base your locator. 据我了解-您知道BTC字符串，可以使用它作为您的定位器的基础。

So, if it would be XPath, you can use that and following-sibling::text() : 因此，如果它将是XPath，则可以使用它和following-sibling::text() ：

//h2[. = 'BTC']/following-sibling::text()

Example using lxml.html : 使用lxml.html示例：

from lxml.html import fromstring

data = """<div class="rankings-col__header__segment"><h2>BTC</h2><weak>usd </weak>10857.00</div>"""

root = fromstring(data)
print(root.xpath("//h2[. = 'BTC']/following-sibling::text()"))

Prints ['10857.00'] . 打印['10857.00'] 。

If, by any chance, you use BeautifulSoup , it would be: 如果您有机会使用BeautifulSoup ，它将是：

from bs4 import BeautifulSoup


data = """<div class="rankings-col__header__segment"><h2>BTC</h2><weak>usd </weak>10857.00</div>"""

soup = BeautifulSoup(data, "html.parser")
print(soup.find("h2", string="BTC").find_next_sibling(text=True))

在python中进行网络抓取时在HTML中找到正确的标签

问题描述

1 个解决方案

解决方案1
0 2017-12-03 03:35:44

在python中进行网络抓取时在HTML中找到正确的标签

问题描述

1 个解决方案

解决方案1 0 2017-12-03 03:35:44

解决方案1
0 2017-12-03 03:35:44