如何优化 JSON webscrape？

Question

我最近对比特币和整个区块链感兴趣。 由于每笔交易在设计上都是公开的，我认为调查钱包数量、交易规模等会很有趣。 但是比特币目前的区块高度是732,324，一个接一个要走过的区块是相当多的。 因此，我想获取每个块的 hash 代码，以便我可以多线程抓取交易。

区块链一个接一个地链接一个块，如果我 go 到第一个块（创世块），然后简单地找到链中的下一个块，依此类推直到结束，我应该有我需要的东西。 我是 python 的新手，但下面是我获取哈希并将它们保存到文件的代码。 但是，按照目前的速度，在我的机器上完成需要 30-40 小时。 那么，有没有更有效的方法来解决这个问题呢？

#imports
from urllib.request import urlopen
from datetime import datetime
import json

#Setting start parameters
genesisBlock = "000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f"
baseurl = "https://blockchain.info/rawblock/"
i = 0 #counter for tracking progress

#Set HASH
blockHASH = genesisBlock

#Open file to save results
filePath = "./blocklist.tsv"
fileObject = open(filePath, 'a')

#Write header, if first line
if i == 0:
    fileObject.write("blockHASH\theight\ttime\tn_tx\n")

#Start walking through each block
while blockHASH != "" :

    #Print progress
    if i % 250 == 0:
        print(str(i)+"|"+datetime.now().strftime("%H:%M:%S"))

    # store the response of URL
    url = baseurl+blockHASH    
    response = urlopen(url)

    # storing the JSON response in data
    data_json = json.loads(response.read().decode())

    #Write result to file
    fileObject.write(blockHASH+"\t"+
                      str(data_json["height"])+"\t"+
                      str(data_json["time"])+"\t"+
                      str(data_json["n_tx"])+"\t"+
                      "\n")
                      
    #increment counter
    i = i + 1

    #Set new hash
    blockHASH = data_json["next_block"][0]

    if i > 1000: break #or just let it run until completion

# Close the file
fileObject.close()

Answer 1

虽然这不会直接评论您的方法的效率，但使用orjson或rapidjson肯定会加快您的结果，因为它们都比标准 json 库快很多。

Rapidjson 可以像import rapidjson as json一样容易地交换，而 orjson 你必须做一些改变，如他们的 github 页面所述，但没有什么太难的。

如何优化 JSON webscrape？

问题描述

1 个解决方案

解决方案1
1 2022-04-17 20:23:37

如何优化 JSON webscrape？

问题描述

1 个解决方案

解决方案1 1 2022-04-17 20:23:37

解决方案1
1 2022-04-17 20:23:37