简体   繁体   English

试图在 html 代码中找到特定的东西

[英]Trying to find something specific in html code

I am trying to find a specific ID to an altcoin, but not sure how to do it.我正在尝试查找山寨币的特定 ID,但不知道该怎么做。 When I print, I get a very long json script and I get lost in trying to find it.当我打印时,我得到一个很长的 json 脚本,我在试图找到它时迷路了。 Is there an easier way?有没有更简单的方法?

from bs4 import BeautifulSoup
import requests
import pandas as pd
import json
import time


cmc = requests.get('https://coinmarketcap.com/')
soup = BeautifulSoup(cmc.content, 'html.parser')

print(soup.prettify())

The output I want is to determine the exact id corresponding to the altcoin.我想要的 output 是确定与山寨币对应的确切 id。 The output below is for one coin, but it is a long list, and I can not easily find the exact one without manually looking.下面的 output 是一枚硬币,但清单很长,如果不手动查找,我很难找到确切的一枚。

{"id":1,"name":"Bitcoin","symbol":"BTC","slug":"bitcoin","max_supply":21000000,"circulating_supply":18614718,"total_supply":18614718,"last_updated":"2021-01-30T15:00:02.000Z","quote":{"USD":{"price":34177.31601866782,"volume_24h":83208963467.24487,"percent_change_1h":1.15037986,"percent_change_24h":-10.87555443,"percent_change_7d":7.03677315,"percent_change_30d":19.84946991,"market_cap":636201099684.3843,"last_updated":"2021-01-30T15:00:02.000Z"}},"rank":1,"noLazyLoad":true}

I took a closer look at the HTML.我仔细查看了 HTML。

It appears that the JSON string data you seek is inside of a <script> tag with id "__NEXT_DATA__" .您寻找的 JSON 字符串数据似乎位于 ID 为"__NEXT_DATA__"<script>标记内。

I'm not that familiar with BeautifulSoup so a more elegant way may exist to get the data.我对 BeautifulSoup 不太熟悉,因此可能存在更优雅的方式来获取数据。 Here is the code I used.这是我使用的代码。

cmc = requests.get('https://coinmarketcap.com/')
soup = BeautifulSoup(cmc.content, 'html.parser')

for item in soup.select('script[id="__NEXT_DATA__"]'):
    data = json.loads(item.string) # load JSON string as a dict
    desired_data = data["props"]["initialState"]["cryptocurrency"]["listingLatest"][
        "data"
    ]
    print(
        json.dumps( # pretty output string
            desired_data,
            indent=2,
        ),
    )

TRUNCATED OUTPUT:截断 OUTPUT:

[
  {
    "id": 1,
    "name": "Bitcoin",
    "symbol": "BTC",
    "slug": "bitcoin",
    "max_supply": 21000000,
    "circulating_supply": 18614718,
    "total_supply": 18614718,
    "last_updated": "2021-01-30T14:51:02.000Z",
    "quote": {
      "USD": {
        "price": 34138.18238095427,
        "volume_24h": 83651976977.0413,
        "percent_change_1h": 1.36922474,
        "percent_change_24h": -9.82670796,
        "percent_change_7d": 6.33079444,
        "percent_change_30d": 19.72629419,
        "market_cap": 635472638054.0323,
        "last_updated": "2021-01-30T14:51:02.000Z"
      }
    },
    "rank": 1,
    "noLazyLoad": true
  },
  {
    "id": 1027,
    "name": "Ethereum",
    "symbol": "ETH",
    "slug": "ethereum",
    "max_supply": null,
    "circulating_supply": 114465285.999,
    "total_supply": 114465285.999,
    "last_updated": "2021-01-30T14:51:02.000Z",
    "quote": {
      "USD": {
        "price": 1364.155096452962,
        "volume_24h": 38819994919.48616,
        "percent_change_1h": 1.95180621,
        "percent_change_24h": -3.86551103,
        "percent_change_7d": 10.22893483,
        "percent_change_30d": 85.96783538,
        "market_cap": 156148403262.48172,
        "last_updated": "2021-01-30T14:51:02.000Z"
      }
    },
    "rank": 2,
    "noLazyLoad": true
  },…

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM