[英]NodeJS - how to scrape ld+json data and save it to an object
I've been trying to find a way to get the apllication/ld+json contents and saving it to a local object.我一直在尝试找到一种方法来获取应用程序/ld+json 内容并将其保存到本地 object。 What I want to have is save it to an object, and in my program I would be able to console.log(data.offers.availability) which will result in logging: "InStock", and this for each of the data values.
我想要的是将它保存到 object 中,在我的程序中,我将能够 console.log(data.offers.availability) 这将导致日志记录:“InStock”,以及每个数据值。
I currently have this:我目前有这个:
let content = JSON.stringify($("script[type='application/ld+json']").html())
let filteredJson = content.replace(/\\n/g, '')
let results = JSON.parse(filteredJson)
console.log(results)
Which results in this: - Doesn't let me console.log(results.offers.availability)这导致: - 不让我 console.log(results.offers.availability)
{ "@context": "http://schema.org/",
"@type": "Product", "name": "Apex Legends - Bangalore - Mini Epics",
"description": "<div class="textblock"><p><h2>Apex Legends - Bangalore - Mini Epics </h2><p>Helden uit alle uithoeken van de wereld strijden voor eer, roem en fortuin in Apex Legends. Weta Workshop betreedt the Wild Frontier en brengt Bangalore met zich mee - Mini Epics style!</p><p>Verzamel alle Apex Legends Mini Epics en voeg ook Bloodhound en Mirage toe aan je collectie!</p></p></div>",
"brand": {
"@type": "Thing",
"name": "Game Mania"
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "5",
"ratingCount": "2"
},
"offers": {
"@type": "Offer",
"priceCurrency": "EUR",
"price": "19.98",
"availability" : "InStock"
}
}
As Bergi pointed out, the problem is that you're using JSON.stringify
on the content which is already a string, but out of curiosity I tried this myself.正如Bergi 指出的那样,问题在于您在已经是字符串的内容上使用
JSON.stringify
,但出于好奇,我自己尝试了这个。 Consider the following test:考虑以下测试:
index.html (that is served through localhost:4000): index.html (通过 localhost:4000 提供):
<html>
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Product",
"name": "Apex Legends - Bangalore - Mini Epics",
"offers": {
"@type": "Offer",
"priceCurrency": "EUR",
"price": "19.98",
"availability": "InStock"
}
}
</script>
<body>
<h2>Index</h2>
</body>
</html>
NodeJS-script: NodeJS 脚本:
const superagent = require('superagent');
const cheerio = require('cheerio');
(async () => {
const response = await superagent("http://localhost:4000");
const $ = cheerio.load(response.text);
// note that I'm not using .html(), although it works for me either way
const jsonRaw = $("script[type='application/ld+json']")[0].children[0].data;
// do not use JSON.stringify on the jsonRaw content, as it's already a string
const result = JSON.parse(jsonRaw);
console.log(result.offers.availability);
})()
result
now is an object that holds the data from the script tag and logging result.offers.availability
, will print InStock
as expected.现在的
result
是一个 object ,它保存来自脚本标签和记录的数据result.offers.availability
,将按预期打印InStock
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.