如何在 node.js 中提取标签的 href

Question

I am trying to scrape this page: https://www.sahibinden.com/kategori-vitrin?date=1day&viewType=Gallery&a5_min=2005&a5_max=2020&category=3530我正在尝试抓取此页面： https://www.sahibinden.com/kategori-vitrin?date=1day&viewType=Gallery&a5_min=2005&a5_max=2020&category=3530

I need to extract links of ads listed on this page.我需要提取此页面上列出的广告的链接。 I provide xpath in yaml file and is then read and interpreted by node.js.我在 yaml 文件中提供了 xpath，然后由 node.js 读取和解释。 In yaml file I simply give it this: data: "xpath: //html/body/div[4]/div[4]/form/div/div[3]/div[2]" and in node.js here is how it is interpreted:在data: "xpath: //html/body/div[4]/div[4]/form/div/div[3]/div[2]"文件中，我简单地给它这个：它被解释为：

function getxPath(data, path) {
  try {
    let root = new dom().parseFromString(data);
    
    let results = xpath.select(path, root);
    console.log(results);
    if (results.length > 0) {
      let _results = [];
      for (let r of results) {
        _results.push(r.textContent);
      }
      return _results;
    }
  } catch (exc) {
    console.log(exc);
  }
  return null;
}

I want to be able to extract links but so far I get only texts like this:我希望能够提取链接，但到目前为止我只得到这样的文本：

827926997 827926997

Sahibinden_Temiz_Orj Km_Tramersiz_




                     72.500 TL



                            Yıl:
                        &nbsp;
                        2010


                            KM:
                        &nbsp;
                        108.000


                            Renk:
                        &nbsp;
                        Gri

                    İlan Tarihi:&nbsp;
                    03 Haziran 2020

                    İl / İlçe:&nbsp;
                    İstanbul / Esenyurt

How do I get links?如何获取链接？

Answer 1

It seems you need to fix your XPath expression.看来您需要修复 XPath 表达式。 You request div element instead of @href attribute.您请求div元素而不是@href属性。

Use the following XPath:使用以下 XPath：

//a[@class="classifiedTitle"]/@href

Output: 20 links per page. Output：每页 20 个链接。

EDIT: In the YAML file, replace double quotes with single quotes, like:编辑：在YAML文件中，用单引号替换双引号，例如：

data: "xpath://a[@class='classifiedTitle']/@href"

如何在 node.js 中提取标签的 href

问题描述

827926997 827926997

1 个解决方案

解决方案1
0 已采纳 2020-06-04 15:14:47

如何在 node.js 中提取标签的 href

问题描述

827926997 827926997

1 个解决方案

解决方案1 0 已采纳 2020-06-04 15:14:47

解决方案1
0 已采纳 2020-06-04 15:14:47