I am using python along with scrapy and selenium.I want to extract the text from the h1 tag which is inside a div class. For example:
<div class = "example">
<h1>
This is an example
</h1>
</div>
This is my tried code:
for single_event in range(1,length_of_alllinks):
source_link.append(alllinks[single_event])
driver.get(alllinks[single_event])
s = Selector(response)
temp = s.xpath('//div[@class="example"]//@h1').extract()
print temp
title.append(temp)
print title
Each and every time I tried different methods I got an empty list.
Now, I want to extract "This is an example" ie h1 text and store it or append it in a list ie in my example title. Like: temp = ['This is an example']
请尝试以下操作以提取所需的文本:
s.xpath('//div[@class="example"]/h1/text()').extract()
For once, it seems that in your HTML the class attribute of the is "example" but in your code you're looking for other class values; At least for XPath queries, keep in mind that you search by exact attribute value. You can use something like:
s.xpath('//div[contains(@class, "example")]')
To find an element that has the "example" class but may have additional classes. I'm not sure if this is a mistake or this is your actual code. In addition the fact that you have spaces in your HTML around the '=' sign of the class attribute may not be helping some parsers either.
Second, your query used in s.xpath
seems wrong. Try something like this:
temp = s.xpath('//div[@class="example"]/h1').extract()
Its not clear from your code what s
is, so I'm assuming the extract()
method does what you think it does. Maybe a more clean code sample would help us help you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.