简体   繁体   中英

How to loop through a page with multiple objects

I'm trying to make a scraper for a market website that lists their content in a weird way. For each item on the list, I have to click it to find the price, and once I've clicked it and return after the data has been parsed, the order of the list of items might have changed.

The is no specific way to refer to each "link" using Watir, so to actually successfully parse the price of all the items in an orderly fashion is outside of my knowledge.

I use Watir to access an item rb browser.element(:text => 'ItemName').click . That is fine if it was just the one item I wanted to parse data from using Nokogiri.

<div id="market" class="itemList active" style="height: 396px;" data-category="30">
    <div class="item" data-mainkey="4601" data-count="40503" data-grade="0">
        <div class="thumbImg" style="background-image: url(&quot;https://akamai-webcdn.kgstatic.net/TradeMarket/Common/item/4601.png&quot;)"></div>
        <div class="content"><p>Ash Timber</p><p class="gray"></p></div>
        <div class="count">40503</div>
    </div>
    <div class="item" data-mainkey="4602" data-count="266" data-grade="0">
    <div class="thumbImg" style="background-image: url(&quot;https://akamai-webcdn.kgstatic.net/TradeMarket/Common/item/4602.png&quot;)"></div>
    <div class="content"><p>Maple Timber</p><p class="gray"></p></div>
    <div class="count">266</div>
</div>

That is how the list would look, except it's a few hundred more articles.

browser.element(:text => 'Materials').click
sleep 2
browser.element(:text => 'Wood').click
sleep 2
browser.element(:text => 'Ash Timber').click
sleep 2


page = Nokogiri::HTML(browser.html)
page.xpath('/html/body/div/div[1]/main/div[1]/div[2]/div[2]/p[1]').each do |nc|
  @name = (nc).text
  puts @name
end

Is there a way to iterate through each item based on their "data-mainkey", since that is the identifier of the item from what I've seen.

This is currently the project and it outputs the name of the item just fine.

I would like the project to go through the list of articles, enter each and parse out the value to an array and show the result, but I have no clue how to approach this.

I'm not sure if I understand the page flow correctly, but it sounds like you need to:

  • Store all of the data_mainkey values
  • Iterate through each of the mainkey values - re-locating the element each time

The code would look like:

main_keys = browser.divs(class: 'item').map(&:data_mainkey)
main_keys.each do |key|
  # Depending how the page is written, the div(class: 'content') might not be necessary
  browser.div(data_mainkey: key).div(class: 'content').click

  # Get the price

  # Navigate back to the list page
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM