简体   繁体   中英

Extract h1 text from div class with scrapy or selenium

I am using python along with scrapy and selenium.I want to extract the text from the h1 tag which is inside a div class. For example:

<div class = "example">
 <h1>
    This is an example
 </h1>
</div>

This is my tried code:

for single_event in range(1,length_of_alllinks):
        source_link.append(alllinks[single_event])          
        driver.get(alllinks[single_event])
        s = Selector(response)      
        temp = s.xpath('//div[@class="example"]//@h1').extract()
        print temp          
        title.append(temp)
        print title

Each and every time I tried different methods I got an empty list.

Now, I want to extract "This is an example" ie h1 text and store it or append it in a list ie in my example title. Like: temp = ['This is an example']

请尝试以下操作以提取所需的文本:

s.xpath('//div[@class="example"]/h1/text()').extract()

For once, it seems that in your HTML the class attribute of the is "example" but in your code you're looking for other class values; At least for XPath queries, keep in mind that you search by exact attribute value. You can use something like:

s.xpath('//div[contains(@class, "example")]')

To find an element that has the "example" class but may have additional classes. I'm not sure if this is a mistake or this is your actual code. In addition the fact that you have spaces in your HTML around the '=' sign of the class attribute may not be helping some parsers either.

Second, your query used in s.xpath seems wrong. Try something like this:

temp = s.xpath('//div[@class="example"]/h1').extract()

Its not clear from your code what s is, so I'm assuming the extract() method does what you think it does. Maybe a more clean code sample would help us help you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM