简体   繁体   English

使用xpath和scrapy从HTML提取特定值

[英]Extract specific value from HTML using xpath and scrapy

I have following html Code: 我有以下html代码:

 <tr data-live="COumykPG" data-dt="10,11,2017,19,00" data-def="1"> <td class="table-matches__tt"><span class="table-matches__time" data-live-cell="time">19:00</span><a href="/soccer/germany/oberliga-bremen/oberneuland-habenhauser/COumykPG/" data-live-cell="matchlink"><span>Oberneuland</span> - <span>Habenhauser</span></a></td> <td class="livebet" data-live-cell="livebet">&nbsp;</td> <td class="table-matches__streams" data-live-cell="score"> </td> <td class="table-matches__odds" data-oid="2p2k5xv464x0x6ev9v"><a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv464x0x6ev9v&amp;otheroutcomes=2p2k5xv498x0x0,2p2k5xv464x0x6eva0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">1.10</a></td> <td class="table-matches__odds" data-oid="2p2k5xv498x0x0"><a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv498x0x0&amp;otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv464x0x6eva0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">7.44</a></td> <td class="table-matches__odds" data-oid="2p2k5xv464x0x6eva0"><a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv464x0x6eva0&amp;otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv498x0x0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">12.40</a></td> </tr> 

I try to scrap from the following code the 3 float values: 1,10 7.44 12.40 The expression that i tried to use for geting the value was the following: 我尝试从以下代码中12.40 3个浮点值: 1,10 7.44 12.40我尝试用于获取该值的表达式如下:

response.xpath('//a/@target').extract()

Output that I get is 'mySelections' . 我得到的输出是'mySelections'

Iwant to get the value next to it. 想要得到它旁边的值。 What is the right expression for it? 正确的表达方式是什么?

Thank you in advance 先感谢您

What's wrong 怎么了

response.xpath('//a/ @target ').extract() response.xpath('// a / @target ').extract()

Why? 为什么?

  • If you format your HTML, the error is obvious. 如果格式化HTML,则错误很明显。

    You want to extract text from a tag, not the target attribute. 要提取texta标签,而不是target的属性。

      <tr data-live="COumykPG" data-dt="10,11,2017,19,00" data-def="1"> <td class="table-matches__tt"> <span class="table-matches__time" data-live-cell="time">19:00</span> <a href="/soccer/germany/oberliga-bremen/oberneuland-habenhauser/COumykPG/" data-live-cell="matchlink"> <span>Oberneuland</span> - <span>Habenhauser</span> </a> </td> <td class="livebet" data-live-cell="livebet">&nbsp;</td> <td class="table-matches__streams" data-live-cell="score"></td> <td class="table-matches__odds" data-oid="2p2k5xv464x0x6ev9v"> <a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv464x0x6ev9v&amp;otheroutcomes=2p2k5xv498x0x0,2p2k5xv464x0x6eva0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">1.10</a> </td> <td class="table-matches__odds" data-oid="2p2k5xv498x0x0"> <a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv498x0x0&amp;otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv464x0x6eva0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">7.44</a> </td> <td class="table-matches__odds" data-oid="2p2k5xv464x0x6eva0"> <a href="/myselections.php?action=3&amp;matchid=COumykPG&amp;outcomeid=2p2k5xv464x0x6eva0&amp;otheroutcomes=2p2k5xv464x0x6ev9v,2p2k5xv498x0x0" onclick="return my_selections_click('1x2', 'soccer');" title="Add to My Selections" target="mySelections">12.40</a> </td> </tr> 

    How to fix it 如何修复

  • Use one of those followings 使用以下其中一项

    • response.xpath('//a/text()').extract()
    • According to other developers, response.xpath sometimes will cause bugs, you should use scrapy's selector instead. 根据其他开发人员的说法, response.xpath有时会导致错误,您应该改用scrapy's selector

       from scrapy.selector import Selector result_array = Selector(text=response.body).xpath('//a/text()').extract() 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM