I am trying to scrape this page: https://www.sahibinden.com/kategori-vitrin?date=1day&viewType=Gallery&a5_min=2005&a5_max=2020&category=3530
I need to extract links of ads listed on this page. I provide xpath in yaml file and is then read and interpreted by node.js. In yaml file I simply give it this: data: "xpath: //html/body/div[4]/div[4]/form/div/div[3]/div[2]"
and in node.js here is how it is interpreted:
function getxPath(data, path) {
try {
let root = new dom().parseFromString(data);
let results = xpath.select(path, root);
console.log(results);
if (results.length > 0) {
let _results = [];
for (let r of results) {
_results.push(r.textContent);
}
return _results;
}
} catch (exc) {
console.log(exc);
}
return null;
}
I want to be able to extract links but so far I get only texts like this:
Sahibinden_Temiz_Orj Km_Tramersiz_
72.500 TL
Yıl:
2010
KM:
108.000
Renk:
Gri
İlan Tarihi:
03 Haziran 2020
İl / İlçe:
İstanbul / Esenyurt
How do I get links?
It seems you need to fix your XPath expression. You request div
element instead of @href
attribute.
Use the following XPath:
//a[@class="classifiedTitle"]/@href
Output: 20 links per page.
EDIT: In the YAML
file, replace double quotes with single quotes, like:
data: "xpath://a[@class='classifiedTitle']/@href"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.