xpath: how to extract text before, AND within, AND after the <strong> element

Question

I am working on a Scrapy spider, in which xpath is used to extract information needed. The source page was first generated by using the website's search function. For example, my interest is to get the items with "computer" in the title. On the source page, all the "computer" is in bold because of the search process. And "computer" could be in the beginning, or the middle or the end of the titles. Some items don't have "computer" in the title. See the examples below:

Example 1: ("computer" at the beginning)
<a class="title" href="whatever1">
<strong> Computer </strong>
, used
</a>  

Example 2: ("computer" in the middle)
<a class="title" href="whatever2">
Low price
<strong> computer </strong>
, great deal
</a> 

Example 3: ("computer" at the end)
<a class="title" href="whatever3">
Don't miss this
<strong> Computer </strong>
</a>

Example 4: (no keyword of "computer")
<a class="title" href="whatever4">
Best laptop deal ever!      
</a>

The xpath code I tried .//a[@class="title"]/text() will only generate the portion AFTER the strong element. For the above 4 examples, I will get the following results:

Example 1:
, used

Example 2:
, great deal

Example 3: (Nothing)


Example 4:
Best laptop deal ever!

I need a xpath code to cover all these four situation and collect the full titles of each item.

Answer 1

The simplest approach would be to search for all "text" nodes and "join" them:

"".join(response.xpath('.//a[@class="title"]//text()').extract())

Note the double slash before the text() this is the key fix here.

xpath: how to extract text before, AND within, AND after the <strong> element

Question

1 answers

solution1
4 ACCPTED 2015-10-11 03:16:16

xpath: how to extract text before, AND within, AND after the <strong> element

Question

1 answers

solution1 4 ACCPTED 2015-10-11 03:16:16

solution1
4 ACCPTED 2015-10-11 03:16:16