简体   繁体   中英

scrape data through xpath from div that contains javascript in scrapy python

I am working on scrapy , i am scraping a site and using xpath to scrape items. But some of the div contains javascript , so when i used xpath until the div id that contains javascript code is returning an empty list,and without including that div element(which contains javascript) can able to fetch HTML data

HTML code

<div class="subContent2">    
   <div id="contentDetails">
       <div class="eventDetails">
            <h2>
                <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a>
            </h2>
       </div>
   </div>
</div> 

Spider Code

class ExampleSpider(BaseSpider):
    name = "example"
    domain_name = "www.example.com"
    start_urls = ["http://www.example.com/jkl/index.php"]


    def parse(self, response):
         hxs = HtmlXPathSelector(response)
         required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]')

So how can i get text(Some data) from the anchor tag inside the h2 element as mentioned above, is there any alternate way for fetching data from the elements that contains javascript in scrapy

<div class="subContent2">    
   <div id="contentDetails">
       <div class="eventDetails">
            <h2>
                <a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a>
            </h2>
       </div>
   </div>
</div> 

The problem is not the javascript code in this case to get 'Some data' string.

You need either to get the subnode:

required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]/h2/a/text()')

在此处输入图片说明

or use string function:

required_data = hxs.select('string(//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"])')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM