简体   繁体   English

NodeJS - 如何抓取 ld+json 数据并将其保存到 object

[英]NodeJS - how to scrape ld+json data and save it to an object

I've been trying to find a way to get the apllication/ld+json contents and saving it to a local object.我一直在尝试找到一种方法来获取应用程序/ld+json 内容并将其保存到本地 object。 What I want to have is save it to an object, and in my program I would be able to console.log(data.offers.availability) which will result in logging: "InStock", and this for each of the data values.我想要的是将它保存到 object 中,在我的程序中,我将能够 console.log(data.offers.availability) 这将导致日志记录:“InStock”,以及每个数据值。

I currently have this:我目前有这个:

            let content = JSON.stringify($("script[type='application/ld+json']").html())
            let filteredJson = content.replace(/\\n/g, '')
            let results = JSON.parse(filteredJson)
            console.log(results)

Which results in this: - Doesn't let me console.log(results.offers.availability)这导致: - 不让我 console.log(results.offers.availability)

 {    "@context": "http://schema.org/", 
   "@type": "Product",    "name": "Apex Legends - Bangalore - Mini Epics",
    "description": "<div class="textblock"><p><h2>Apex Legends - Bangalore - Mini Epics </h2><p>Helden uit alle uithoeken van de wereld strijden voor eer, roem en fortuin in Apex Legends. Weta Workshop betreedt the Wild Frontier en brengt Bangalore met zich mee - Mini Epics style!</p><p>Verzamel alle Apex Legends Mini Epics en voeg ook Bloodhound en Mirage toe aan je collectie!</p></p></div>",
"brand": {
        "@type": "Thing",
        "name": "Game Mania"    
},
"aggregateRating": {        
        "@type": "AggregateRating",
        "ratingValue": "5",
        "ratingCount": "2"    
},
"offers": {        
        "@type": "Offer",
        "priceCurrency": "EUR",
        "price": "19.98",        
        "availability" : "InStock"    
   }
}

Data im trying to scrape and save:我试图抓取和保存的数据: 在此处输入图像描述

As Bergi pointed out, the problem is that you're using JSON.stringify on the content which is already a string, but out of curiosity I tried this myself.正如Bergi 指出的那样,问题在于您在已经是字符串的内容上使用JSON.stringify ,但出于好奇,我自己尝试了这个。 Consider the following test:考虑以下测试:

index.html (that is served through localhost:4000): index.html (通过 localhost:4000 提供):

<html>
<script type="application/ld+json">
    {
        "@context": "http://schema.org",
        "@type": "Product",
        "name": "Apex Legends - Bangalore - Mini Epics",
        "offers": {
            "@type": "Offer",
            "priceCurrency": "EUR",
            "price": "19.98",
            "availability": "InStock"
        }
    }
</script>
<body>
<h2>Index</h2>
</body>
</html>

NodeJS-script: NodeJS 脚本:

const superagent = require('superagent');
const cheerio = require('cheerio');

(async () => {
    const response = await superagent("http://localhost:4000");

    const $ = cheerio.load(response.text);
    // note that I'm not using .html(), although it works for me either way
    const jsonRaw = $("script[type='application/ld+json']")[0].children[0].data; 
    // do not use JSON.stringify on the jsonRaw content, as it's already a string
    const result = JSON.parse(jsonRaw);
    console.log(result.offers.availability);
})()

result now is an object that holds the data from the script tag and logging result.offers.availability , will print InStock as expected.现在的result是一个 object ,它保存来自脚本标签和记录的数据result.offers.availability ,将按预期打印InStock

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM