[英]How can I Get element by data attribute in string and get the inner text in a node.js application?
I`am new to node and I hope my question is not too ugly. 我是node的新手,希望我的问题不会太丑。 Thanks. 谢谢。
There is a node app and a part of it result in a stream of a html site. 有一个节点应用程序,它的一部分导致了html站点流。
.pipe(tap(function(file) {
var content = file.contents.toString(),
relPathPrefix = path.join(path.relative(file.path, './source'));
relPathPrefix = relPathPrefix
.replace(new RegExp('\\' + path.sep, 'g'), '/') // Normalize path separator
.replace(/\.\.$/, ''); // Remove trailing ..
content = content.replace(/('|")\//g, '$1' + relPathPrefix);
//console.log(content)
//here is where i`am struggling
file.contents = Buffer.from(content);
}))
The variable "content" hold the full website data. 变量“内容”保存完整的网站数据。 Now I have to parse the the variable "content" and i will find the data-attribute "data-type" with value "meta.code.data" and then get the textContent. 现在,我必须解析变量“ content”,我将找到值为“ meta.code.data”的数据属性“ data-type”,然后获取textContent。 At least, I will sripe out the text itself from a value like "...innerText...". 至少,我将从“ ... innerText ...”之类的值中剥离出文本本身。
I know how to do that in Browser with jQuery: 我知道如何使用jQuery在浏览器中执行此操作:
var json = $('pre[data-type="meta.code.data"]').text()
and also with vanilla JavaScript 以及香草JavaScript
var json = document.querySelectorAll('[data-type="meta.code.data"]')[0].textContent;
but actually I do not find the way to parse a string/variable I thought about something like 但实际上我找不到解析我想到的类似字符串/变量的方法
content = content.contains('[data-type="meta.code.data"]')[0].textContent;
but this does not work. 但这不起作用。
It's not the same, browser parses DOM and here you have a string with all website data, so old techniques won't work. 这是不一样的,浏览器解析DOM,在这里您有一个包含所有网站数据的字符串,因此旧技术将不起作用。 I would personally try using the DOM parser (altho I never tried to use that in Node application) but it seems there is npm package that works basically the same. 我将亲自尝试使用DOM解析器(尽管我从未尝试在Node应用程序中使用它),但似乎有npm软件包的工作原理基本相同。
I assume you have already used npm with Node so here is a link to the exact module: https://www.npmjs.com/package/xmldom 我假设您已经将npm与Node一起使用了,所以这里是确切模块的链接: https : //www.npmjs.com/package/xmldom
Here is example how it functions: 这是它如何运作的示例:
var DOMParser = require('xmldom').DOMParser;
var doc = new DOMParser().parseFromString(
'<xml xmlns="a" xmlns:c="./lite">\n'+
'\t<child>test</child>\n'+
'\t<child></child>\n'+
'\t<child/>\n'+
'</xml>'
,'text/xml');
doc.documentElement.setAttribute('x','y');
doc.documentElement.setAttributeNS('./lite','c:x','y2');
var nsAttr = doc.documentElement.getAttributeNS('./lite','x')
console.info(nsAttr)
console.info(doc)
As you can see you can now use things you would use in the browser, and since you already know how to do that your problem is now gone ;) 如您所见,您现在可以使用在浏览器中使用的功能,并且由于您已经知道该怎么做,因此您的问题现在消失了;)
I would recommend parsing DOM directly over parsing that string with regex etc. There are numerous reasons if you google what approach is better. 我建议直接使用regex等解析该字符串,而不是直接解析DOM。如果您用Google搜索哪种方法更好,有很多原因。
as @JonasW commented out, the use of cheerio did the trick. 正如@JonasW指出的那样,使用cheerio可以解决问题 。
First declare cheerio at the top of your module: 首先在模块顶部声明cheerio:
const cheerio = require('cheerio');
then load content and parse data: 然后加载内容并解析数据:
const $ = cheerio.load(content)
content = $('pre[data-type="meta.code.data"]').text()
in summary: 综上所述:
.pipe(tap(function(file) {
var content = file.contents.toString(),
relPathPrefix = path.join(path.relative(file.path, './source'));
relPathPrefix = relPathPrefix
.replace(new RegExp('\\' + path.sep, 'g'), '/') // Normalize path separator
.replace(/\.\.$/, ''); // Remove trailing ..
content = content.replace(/('|")\//g, '$1' + relPathPrefix);
//GET ONLY JSON DATA FROM STREAM
const $ = cheerio.load(content)
content = $('pre[data-type="meta.code.data"]').text()
file.contents = Buffer.from(content);
}))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.