简体   繁体   中英

How can I scrape the text from this popup window? [Python and Scrapy]

Please note - I'm very unexperienced and this is my first 'real' project.

I'm going to try to explain my problem as best as I can, apologies if some of the terms are incorrect.

I'm trying to scrape the following webpage - https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO

I can scrape the 'Name' and 'Status', but I also need to get some of the information in the 'Full Details' popup window.

I have noticed that when clicking on the 'Full Details' button the URL stays the same.

Below is what my code looks like:

import scrapy
from FirstScrape.items import FirstscrapeItem

class FirstSpider(scrapy.Spider):
    name = "spiderman"
    start_urls = [
        
        "https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO"
        
        ]
    
    def parse(self, response):
        item = FirstscrapeItem()
        item['name'] = response.xpath("//tr[@class='even']/td[1]/text()").get()
        item['status'] = response.xpath("//tr[@class='even']/td[2]/text()").get()
        #first refers to firstname in the popup window
        item['first'] = response.xpath("//div[@class='result-list default']/tbody/tr[2]/td[2]/text()").get()
        
        
        return item

I launch my code from the terminal and export it to a.csv file.

Not sure if this will help but this is the popup / fancy box window:

popup window

Do I need to use Selenium to click on the button or am I just missing something? Any help will be appreciated.

I'm very eager to learn more about Python and scraping.

Thank you.

This is the URL you need to extract from your starting page:

<a href="/listing_detail.php?agents_id=169039" class="agent-detail">Full Detail</a>

To get the content of pop-up-window open this extracted URL as another request.

In the Full Detail you have the href attribute you need to get this url and make requests. Maybe it helps you:

import scrapy
from scrapy.crawler import CrawlerProcess

class FirstSpider(scrapy.Spider):
    name = "spiderman"
    start_urls = [
        
        "https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO"
        
        ]
    
    def parse(self, response):
                
        all_urls = [i.attrib["href"] for i in response.css(".agent-detail")]
        for url in all_urls:
            yield scrapy.Request(url=f"https://www.eaab.org.za{url}", callback=self.parse_data)
        
    def parse_data(self, response):
        print(response.css("td::text").extract())
        print("-----------------------------------")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM