简体   繁体   English

如何使用 XPath 将列表标记作为对象数组获取?

[英]How to get list tag as array of objects using XPath?

I am trying to extract the ordered list and return an array of list tags & content inside.我正在尝试提取有序列表并返回其中的列表标签和内容数组。 I have already tried these paths,我已经尝试过这些路径,

  1. //li[div/@class="business-info"]
  2. //li[div[@class="business-info"]]
  3. //li[descendant::div[@class="business-info"]]
  4. //li[div[@class="business-info"]/h2/a]

Is this the right approach or should I go with RegExp ?这是正确的方法还是我应该使用RegExp I am sharing my code to have a drill down.我正在共享我的代码以进行深入研究。

Code代码

const IGNORE = ['style', 'script'];
const NONWHITESPACE_RE = /\S/;
const result = document.evaluate(
    '//*[child::text()]',
    document,
    null,
    XPathResult.ORDERED_NODE_SNAPSHOT_TYPE,
    null
);
const businessInfo = [];
for (let i = 0, j = result.snapshotLength; i < j; i++) {
    const element = result.snapshotItem(i);
    if (IGNORE.includes(element.tagName.toLowerCase())) {
        continue;
    }
    const nodes = [...element.childNodes];
    for (const node of nodes) {
        if (node.nodeType !== document.TEXT_NODE) {
            continue;
        }
        if (node.nodeValue.search(NONWHITESPACE_RE) === -1) {
            continue;
        }
        businessInfo.push({
            tag: element.tagName.toLowerCase(),
            text: node.nodeValue.trim()
        });
    }
}
console.log(businessInfo);

HTML HTML

<ol class="results">
    <li class="result clearfix">
        <div class="business-info">
            <h2 itemprop="name">
                <a class="name" href="#">Company Ltd</a>
            </h2>
            <a href="#" class="phone disabled"><span class="tel-icon sprite"></span><span itemprop="telephone" class="phone">0123456789</span></a>
            <div class="address">
                <span class="address-main"><span itemprop="streetAddress">21 Largo Road</span>, <span itemprop="addressLocality">Focus</span>, </span>
                &nbsp;<span class="postcode" itemprop="postalCode">KY168NH</span>
            </div>
        </div>
    </li>
    <li class="result clearfix">
        <div class="business-info">
            <h2 itemprop="name">
                <a class="name" href="#">Shipment Ltd</a>
            </h2>
            <a href="#" class="phone disabled"><span class="tel-icon sprite"></span><span itemprop="telephone" class="phone">0123456789</span></a>
            <div class="address">
                <span class="address-main"><span itemprop="streetAddress">ECR Road</span>, <span itemprop="addressLocality">St Andrews</span>, </span>
                &nbsp;<span class="postcode" itemprop="postalCode">800826</span>
            </div>
        </div>
    </li>
</ol>

Expected Output: Array of Objects预期输出:对象数组

const businessInfo = [
    {
        name: 'Company Ltd',
        phone: '0123456789',
        address: '21 Largo Road',
        locality: 'Focus',
        postal: 'KY168NH'
    },
    {
        name: 'Company Ltd1',
        phone: '0123456789',
        address: 'ECR Road',
        locality: 'St Andrews',
        postal: '800826'
    },
];

These are the sources I took for reference这些是我参考的来源

  1. https://developer.mozilla.org/en-US/docs/Web/API/XPathResult/snapshotItem https://developer.mozilla.org/en-US/docs/Web/API/XPathResult/snapshotItem
  2. https://developer.mozilla.org/en-US/docs/Web/API/XPathResult/iterateNext https://developer.mozilla.org/en-US/docs/Web/API/XPathResult/iterateNext
  3. Get XPath of XML Tag 获取 XML 标签的 XPath

Look's like there is no need of XPath for my case I solved it by finding an innerText for each property with the help of selector.看起来我的情况不需要XPath,我通过在选择器的帮助下为每个属性找到一个innerText来解决它。 And then I composed an array of objects with these properties.然后我用这些属性组成了一个对象数组。 Now the output will be what I actually expected.现在输出将是我实际预期的。

const businessInfo = [];
const elements = document.querySelectorAll('ol > li > div.business-info');
elements.forEach((element) => {
    const companyInfo = {};
    try {
        businessInfo.name = element.querySelector('h2 > a').innerText;
        businessInfo.phone = element.querySelector('a > span.phone').innerText;
        businessInfo.address = element.querySelector('div > span.address-main > span:nth-child(1)').innerText;
        businessInfo.locality = element.querySelector('div > span.address-main > span:nth-child(2)').innerText;
        businessInfo.postalCode = element.querySelector('div > span.postcode').innerText;
    } catch (exception) {

    }
    data.push(businessInfo);
});
console.log(businessInfo);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM