Please note - I'm very unexperienced and this is my first 'real' project.
I'm going to try to explain my problem as best as I can, apologies if some of the terms are incorrect.
I'm trying to scrape the following webpage - https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO
I can scrape the 'Name' and 'Status', but I also need to get some of the information in the 'Full Details' popup window.
I have noticed that when clicking on the 'Full Details' button the URL stays the same.
Below is what my code looks like:
import scrapy
from FirstScrape.items import FirstscrapeItem
class FirstSpider(scrapy.Spider):
name = "spiderman"
start_urls = [
"https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO"
]
def parse(self, response):
item = FirstscrapeItem()
item['name'] = response.xpath("//tr[@class='even']/td[1]/text()").get()
item['status'] = response.xpath("//tr[@class='even']/td[2]/text()").get()
#first refers to firstname in the popup window
item['first'] = response.xpath("//div[@class='result-list default']/tbody/tr[2]/td[2]/text()").get()
return item
I launch my code from the terminal and export it to a.csv file.
Not sure if this will help but this is the popup / fancy box window:
Do I need to use Selenium to click on the button or am I just missing something? Any help will be appreciated.
I'm very eager to learn more about Python and scraping.
Thank you.
This is the URL you need to extract from your starting page:
<a href="/listing_detail.php?agents_id=169039" class="agent-detail">Full Detail</a>
To get the content of pop-up-window open this extracted URL as another request.
In the Full Detail
you have the href
attribute you need to get this url and make requests. Maybe it helps you:
import scrapy
from scrapy.crawler import CrawlerProcess
class FirstSpider(scrapy.Spider):
name = "spiderman"
start_urls = [
"https://www.eaab.org.za/agent_agency_search?type=Agents&search_agent=+&submit_agent_search=GO"
]
def parse(self, response):
all_urls = [i.attrib["href"] for i in response.css(".agent-detail")]
for url in all_urls:
yield scrapy.Request(url=f"https://www.eaab.org.za{url}", callback=self.parse_data)
def parse_data(self, response):
print(response.css("td::text").extract())
print("-----------------------------------")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.