I am trying to scrape some date inside an tag, but I do not want to get the link that is inside it.
Not really sure how to approach the problem since the tags do not have ID's or classes
<div id="list-section">
<ul>
<li data-store-id="1234">
<div class="item">
<p>
<strong>
<a target="_blank" href="www.somelink.com"> NAME ONE </a>
</strong>
</p>
</div>
</li>
<li data-store-id="1234">
<div class="item">
<p>
<strong>
<a target="_blank" href="www.somelink.com"> NAME TWO </a>
</strong>
</p>
</div>
</li>
</ul>
</div>
I am trying to have every name in an array at the end [NAME ONE, NAME TWO] etc.
Edit: using node with puppeteer
There is a way to find elements that is very useful when web scraping named xpath . Never worked with puppeteer, but I've been worked a lot with selenium recently and I used xpath a lot.
Just a quick view in the docs of puppeteer and I found something that could be useful for you.
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagexexpression
Due to I don't have the full html page, I was able to make a simple xPath to demonstrate its power.
//div[@class='item']//a
You can also test xpath opening Google Chrome DevTools in " Elements " tab and pressing CTRL+F
It's a nice tool for having when web scraping.
You can have the names in an array in two steps:
<a>...</a>
As Douglas mentioned before, you can use XPath, but in this case simple CSS selectors will do the job just fine. As a CSS selector, many combination can get you the anchor tags: #list-section a
, ul a
...
Choose the one that fits you most and is least likely to brake later. I recommend using the first one:
const anchorTags = await page.$$("#list-section a")
As to getting the inner HTML of an element, this SO question will definitely help you. My preferred approach is to have a separate asynchronous function defined as follows:
async function getInnerHtml(page, target){
const innerHTML = await page.evaluate(el => el.innerHTML, target)
return innerHTML
}
This way you would loop on your array and call it on your anchor tags.
Don't forget that there is always many ways to build a scraper. Seems to me like you focused too much on the element, and wanted to select it precisely . Also, it is necessary to get a good grasp of CSS selectors, especially CSS combbinators .
Cheers
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.