简体   繁体   中英

Acquiring node in nodejs + xpath

I have an element on a webpage which gives the following XPath source via Chrome Inspector //*[@id="page-wrapper"]/div/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[3]/td/table[2]/tbody/tr[2]/td[2]/a

I want to get this node programatically in Node.js.

var parser = new parse5.Parser();
var document = parser.parse(data);
var xhtmldoc = xmlserializer.serializeToString(document);
var xdom = new xmldomparser().parseFromString(xhtmldoc);
var selector = xpath.useNamespaces({"doc": "http://www.w3.org/1999/xhtml"});
var node = selector('//*[@id="page-wrapper"]/div/table/tbody/tr/td/table/tbody/tr/td[2]/table/tbody/tr[3]/td/table[2]/tbody/tr[2]/td[2]/a', xdom);
console.log(node);

But it consistently returns an empty object with any variation of xpath. Is it possible to achieve this?

Thanks.

It seems to that you are declaring the correct namespace and a prefix:

 var selector = xpath.useNamespaces({"doc": "http://www.w3.org/1999/xhtml"});

but then you do not use it in the path expression. Prefix elements with doc: in your path expression:

var node = selector('//*[@id="page-wrapper"]/doc:div/doc:table/doc:tbody/doc:tr/doc:td/doc:table/doc:tbody/doc:tr/doc:td[2]/doc:table/doc:tbody/doc:tr[3]/doc:td/doc:table[2]/doc:tbody/doc:tr[2]/doc:td[2]/doc:a', xdom);

That said, the XPath expression you got back from Chrome Inspector is not really handy, and only relies on positions of nodes. If you explain what you are trying to find in that document (and show the document, of course), people could suggest an alternative expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM