简体   繁体   English

Cheerio,axios,reactjs 到 web 从返回空列表的网页上刮下一张桌子

[英]Cheerio, axios, reactjs to web scrape a table off a webpage returning empty list

Trying to scrape this table off this website: https://www.investing.com/commodities/real-time-futures试图从这个网站上刮掉这张桌子: https://www.investing.com/commodities/real-time-futures

But for some reason when I try to get the data, I keep getting an empty list.但是由于某种原因,当我尝试获取数据时,我不断得到一个空列表。

This is what I'm doing to get the data and parse it:这就是我正在做的获取数据并解析它:

componentDidMount() {
    axios.get(`https://www.investing.com/commodities/real-time-futures`)
      .then(response => {
        if(response.status === 200)
          {
            const html = response.data;
            const $ = cheerio.load(html);
            let data = [];
            $('#cross_rate_1 tr').each((i, elem) => {
                data.push({
                  Month: $(elem).find('td#left noWrap').text()
                })
            });
            console.log(data);
          }
        }, (error) => console.log('err') );
  }

This is a screenshot of the particular part of the source code I'm trying to scrape.这是我试图抓取的源代码特定部分的屏幕截图。

在此处输入图像描述

Any help is much appreciated.任何帮助深表感谢。

As already mentioned, the table in question is constantly updating via a websocket connection.如前所述,有问题的表通过 websocket 连接不断更新。 You can try getting the data by either 1) connecting to the websocket or 2) scraping the dynamically generated html.您可以尝试通过以下方式获取数据:1)连接到 websocket 或 2)抓取动态生成的 html。

Just for a data snapshot and not for a continuous time series, you can use a browser scraping extension.仅对于数据快照而不是连续时间序列,您可以使用浏览器抓取扩展。 In this way you won't care about the websocket implementation.这样你就不会关心 websocket 的实现了。

I've identified the price data CSS selectors for you and created a scraping configuration to be used with the open source browser extension https://github.com/get-set-fetch/extension .我已经为您确定了价格数据 CSS 选择器,并创建了一个抓取配置以与开源浏览器扩展https://github.com/get-set-fetch/extension一起使用。

"eLtI4gnapZTLDsIgEEV/hejGLrC+F25N3OrCpUlD6FhIWmiY0f6+1Hd9EJsuSEguGRg4h8fSlS0Km/r3ZesjHR0g2zrtKzL2IYg1wOqLZ2hEicrSwxhFVOIyjquqGmpzAiRtsqG0RSxv5TVg7EDkvC7AD9etmqJlQBz9ONRW8HvgJ06UwD2HpCV/gtpFylFnC39A/s51A3qphMlg94ruBbtNCe5iMr5/EP/S3ICZf4H5myP/0tv3rSIm/oiQjBmlS0OKS6XzdDCJ9iYQT8PxLBzPw/Ei6rWwpZ0dZ2cMF5M=" "eLtI4gnapZTLDsIgEEV/hejGLrC+F25N3OrCpUlD6FhIWmiY0f6+1Hd9EJsuSEguGRg4h8fSlS0Km/r3ZesjHR0g2zrtKzL2IYg1wOqLZ2hEicrSwxhFVOIyjquqGmpzAiRtsqG0RSxv5TVg7EDkvC7AD9etmqJlQBz9ONRW8HvgJ06UwD2HpCV/gtpFylFnC39A/s51A3qphMlg94ruBbtNCe5iMr5/EP/S3ICZf4H5myP/0tv3rSIm/oiQjBmlS0OKS6XzdDCJ9iYQT8PxLBzPw/Ei6rWwpZ0dZ2cMF5M="

Inside the extension do: new project > config hash > paste the above hash (without the quotes) > save, scrape, view results > export as csv.在扩展里面做:新项目>配置hash>粘贴上面的hash(不带引号)>保存,刮,查看结果>导出为csv。

Disclaimer: I'm the extension author.免责声明:我是扩展作者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM