[英]Extract specific value from HTML using xpath and scrapy
I have following html Code: 我有以下html代码:
<tr data-live="COumykPG" data-dt="10,11,2017,19,00" data-def="1"> <td class="table-matches__tt"><span class="table-matches__time" data-live-cell="time">19:00</span><a href="/soccer/germany/oberliga-bremen/oberneuland-habenhauser/COumykPG/" data-live-cell="matchlink"><span>Oberneuland</span> - <span>Habenhauser</span></a></td> <td class="livebet" data-live-cell="livebet"> </td> <td class="table-matches__streams" data-live-cell="score"> </td> <td class="table-matches__odds" data-oid="2p2k5xv464x0x6ev9v"><a href="/myselections.php?action=3&matchid=COumykPG&outcomeid=2p2k5xv464x0x6ev9v&otheroutcomes=2p2k5xv498x0x0,2p2k5xv464x0x6eva0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">1.10</a></td> <td class="table-matches__odds" data-oid="2p2k5xv498x0x0"><a href="/myselections.php?action=3&matchid=COumykPG&outcomeid=2p2k5xv498x0x0&otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv464x0x6eva0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">7.44</a></td> <td class="table-matches__odds" data-oid="2p2k5xv464x0x6eva0"><a href="/myselections.php?action=3&matchid=COumykPG&outcomeid=2p2k5xv464x0x6eva0&otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv498x0x0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">12.40</a></td> </tr>
I try to scrap from the following code the 3 float values: 1,10
7.44
12.40
The expression that i tried to use for geting the value was the following: 我尝试从以下代码中12.40
3个浮点值: 1,10
7.44
12.40
我尝试用于获取该值的表达式如下:
response.xpath('//a/@target').extract()
Output that I get is 'mySelections'
. 我得到的输出是'mySelections'
。
Iwant to get the value next to it. 想要得到它旁边的值。 What is the right expression for it? 正确的表达方式是什么?
Thank you in advance 先感谢您
response.xpath('//a/ @target ').extract() response.xpath('// a / @target ').extract()
If you format your HTML, the error is obvious. 如果格式化HTML,则错误很明显。
You want to extract
text
froma
tag, not thetarget
attribute. 要提取text
从a
标签,而不是target
的属性。
<tr data-live="COumykPG" data-dt="10,11,2017,19,00" data-def="1"> <td class="table-matches__tt"> <span class="table-matches__time" data-live-cell="time">19:00</span> <a href="/soccer/germany/oberliga-bremen/oberneuland-habenhauser/COumykPG/" data-live-cell="matchlink"> <span>Oberneuland</span> - <span>Habenhauser</span> </a> </td> <td class="livebet" data-live-cell="livebet"> </td> <td class="table-matches__streams" data-live-cell="score"></td> <td class="table-matches__odds" data-oid="2p2k5xv464x0x6ev9v"> <a href="/myselections.php?action=3&matchid=COumykPG&outcomeid=2p2k5xv464x0x6ev9v&otheroutcomes=2p2k5xv498x0x0,2p2k5xv464x0x6eva0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">1.10</a> </td> <td class="table-matches__odds" data-oid="2p2k5xv498x0x0"> <a href="/myselections.php?action=3&matchid=COumykPG&outcomeid=2p2k5xv498x0x0&otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv464x0x6eva0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">7.44</a> </td> <td class="table-matches__odds" data-oid="2p2k5xv464x0x6eva0"> <a href="/myselections.php?action=3&matchid=COumykPG&outcomeid=2p2k5xv464x0x6eva0&otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv498x0x0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">12.40</a> </td> </tr>
Use one of those followings 使用以下其中一项
response.xpath('//a/text()').extract()
According to other developers, response.xpath
sometimes will cause bugs, you should use scrapy's selector
instead. 根据其他开发人员的说法, response.xpath
有时会导致错误,您应该改用scrapy's selector
。
from scrapy.selector import Selector result_array = Selector(text=response.body).xpath('//a/text()').extract()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.