简体   繁体   English

使用Python对Web内容进行Web抓取

[英]Web scraping for JavaScript contents using Python

I'm trying to get the data of 'SALES HISTORY' from here . 我正试图从这里获取'SALES HISTORY'的数据。

Since the data is coming from JavaScript, I refereed to this link and tried to scrape the data. 由于数据来自JavaScript,我参考了这个链接 ,试图抓取数据。 However when i run the below code, the new window doesn't show the web page properly. 但是,当我运行以下代码时,新窗口不会正确显示网页。

I would be appreciated if you could advice how to get the data in this case. 如果你能在这种情况下建议如何获取数据,我将不胜感激。

# import libraries
import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import pandas as pd
# specify the url
urlpage = 'https://nonfungible.com/market/history/axieinfinity' 
print(urlpage)
# run Chrome webdriver from executable path of your choice
driver = webdriver.Chrome(executable_path = r'C:\Users\trey\AppData\Local\Programs\Python\Python36\Scripts\chromedriver')

I expect the output of the data set which contains Contract/Transaction hash/Seller/Buyer/Name/Birth Date columns. 我期望数据集的输出包含合同/交易哈希/卖方/买方/名称/出生日期列。

You don't need to scrape the site to get the sales history data, as you can get it from their JSON API end-point. 您无需刮取网站即可获取销售历史数据,因为您可以从其JSON API端点获取数据。

Here's the link to the end-point from the web page you posted: 这是您发布的网页的终点链接:

https://api.nonfungible.com/api/v3/project/list https://api.nonfungible.com/api/v3/project/list

You can use Python JSON library to extract the data that you want. 您可以使用Python JSON库来提取所需的数据。 To find whether a site has a usable JSON API, use the network monitor from your browser developer console to find the XHR requests made to the site, and check if it contains the data you require. 要查找站点是否具有可用的JSON API,请使用浏览器开发人员控制台中的网络监视器查找对站点发出的XHR请求,并检查它是否包含您需要的数据。 This would make more sense than scraping the HTML/JS. 这比抓取HTML / JS更有意义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM