[英]How to get list tag as array of objects using XPath?
I am trying to extract the ordered list and return an array of list tags & content inside.我正在尝试提取有序列表并返回其中的列表标签和内容数组。 I have already tried these paths,
我已经尝试过这些路径,
//li[div/@class="business-info"]
//li[div[@class="business-info"]]
//li[descendant::div[@class="business-info"]]
//li[div[@class="business-info"]/h2/a]
Is this the right approach or should I go with RegExp
?这是正确的方法还是我应该使用
RegExp
? I am sharing my code to have a drill down.我正在共享我的代码以进行深入研究。
Code代码
const IGNORE = ['style', 'script'];
const NONWHITESPACE_RE = /\S/;
const result = document.evaluate(
'//*[child::text()]',
document,
null,
XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
null
);
const businessInfo = [];
for (let i = 0, j = result.snapshotLength; i < j; i++) {
const element = result.snapshotItem(i);
if (IGNORE.includes(element.tagName.toLowerCase())) {
continue;
}
const nodes = [...element.childNodes];
for (const node of nodes) {
if (node.nodeType !== document.TEXT_NODE) {
continue;
}
if (node.nodeValue.search(NONWHITESPACE_RE) === -1) {
continue;
}
businessInfo.push({
tag: element.tagName.toLowerCase(),
text: node.nodeValue.trim()
});
}
}
console.log(businessInfo);
HTML HTML
<ol class="results">
<li class="result clearfix">
<div class="business-info">
<h2 itemprop="name">
<a class="name" href="#">Company Ltd</a>
</h2>
<a href="#" class="phone disabled"><span class="tel-icon sprite"></span><span itemprop="telephone" class="phone">0123456789</span></a>
<div class="address">
<span class="address-main"><span itemprop="streetAddress">21 Largo Road</span>, <span itemprop="addressLocality">Focus</span>, </span>
<span class="postcode" itemprop="postalCode">KY168NH</span>
</div>
</div>
</li>
<li class="result clearfix">
<div class="business-info">
<h2 itemprop="name">
<a class="name" href="#">Shipment Ltd</a>
</h2>
<a href="#" class="phone disabled"><span class="tel-icon sprite"></span><span itemprop="telephone" class="phone">0123456789</span></a>
<div class="address">
<span class="address-main"><span itemprop="streetAddress">ECR Road</span>, <span itemprop="addressLocality">St Andrews</span>, </span>
<span class="postcode" itemprop="postalCode">800826</span>
</div>
</div>
</li>
</ol>
Expected Output: Array of Objects预期输出:对象数组
const businessInfo = [
{
name: 'Company Ltd',
phone: '0123456789',
address: '21 Largo Road',
locality: 'Focus',
postal: 'KY168NH'
},
{
name: 'Company Ltd1',
phone: '0123456789',
address: 'ECR Road',
locality: 'St Andrews',
postal: '800826'
},
];
These are the sources I took for reference这些是我参考的来源
Look's like there is no need of XPath for my case I solved it by finding an innerText
for each property with the help of selector.看起来我的情况不需要XPath,我通过在选择器的帮助下为每个属性找到一个
innerText
来解决它。 And then I composed an array of objects with these properties.然后我用这些属性组成了一个对象数组。 Now the output will be what I actually expected.
现在输出将是我实际预期的。
const businessInfo = [];
const elements = document.querySelectorAll('ol > li > div.business-info');
elements.forEach((element) => {
const companyInfo = {};
try {
businessInfo.name = element.querySelector('h2 > a').innerText;
businessInfo.phone = element.querySelector('a > span.phone').innerText;
businessInfo.address = element.querySelector('div > span.address-main > span:nth-child(1)').innerText;
businessInfo.locality = element.querySelector('div > span.address-main > span:nth-child(2)').innerText;
businessInfo.postalCode = element.querySelector('div > span.postcode').innerText;
} catch (exception) {
}
data.push(businessInfo);
});
console.log(businessInfo);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.