簡體   English   中英

如何使用Selenium Python從html表中獲取文本對象

[英]How to get text object from a html table using selenium python

我有一部分的html文件,如下所示

<div><pre> <b>Home:</b>   28-12   <b>Road:</b>   23-16   <b>ExtrInn:</b> 2-5 
<b>vsRHP:</b>  38-18   <b>vsLHP:</b>  13-10   <b>1-Run:</b>  17-5 
<b>vsEast:</b> 12-8    <b>vsCntrl:</b> 7-5    <b>vsWest:</b> 26-13 <b>IL:</b> 6-2 

<strong>Last 10 Games</strong>
Gm# Date &amp; Box   Opp W/L Score      Record   Place/GB
 79 <A CLASS=CL HREF="/boxes/NYA/NYA201606290.shtml">Wed, Jun 29</a>  @<A CLASS=CL HREF="/teams/NYY/2016_sched.shtml">NYY</A>  L   7-9       51-28  1st  9.0 up
 78 <A CLASS=CL HREF="/boxes/NYA/NYA201606280.shtml">Tue, Jun 28</a>  @<A CLASS=CL HREF="/teams/NYY/2016_sched.shtml">NYY</A>  W   7-1       51-27  1st 10.0 up
 77 <A CLASS=CL HREF="/boxes/NYA/NYA201606270.shtml">Mon, Jun 27</a>  @<A CLASS=CL HREF="/teams/NYY/2016_sched.shtml">NYY</A>  W   9-6       50-27  1st 10.0 up
 76 <A CLASS=CL HREF="/boxes/TEX/TEX201606260.shtml">Sun, Jun 26</a>   <A CLASS=CL HREF="/teams/BOS/2016_sched.shtml">BOS</A>  W   6-2       49-27  1st 10.0 up
 75 <A CLASS=CL HREF="/boxes/TEX/TEX201606250.shtml">Sat, Jun 25</a>   <A CLASS=CL HREF="/teams/BOS/2016_sched.shtml">BOS</A>  W  10-3       48-27  1st  9.0 up
 74 <A CLASS=CL HREF="/boxes/TEX/TEX201606240.shtml">Fri, Jun 24</a>   <A CLASS=CL HREF="/teams/BOS/2016_sched.shtml">BOS</A>  L   7-8       47-27  1st  9.0 up
 73 <A CLASS=CL HREF="/boxes/TEX/TEX201606220.shtml">Wed, Jun 22</a>   <A CLASS=CL HREF="/teams/CIN/2016_sched.shtml">CIN</A>  W   6-4       47-26  1st 10.0 up
 72 <A CLASS=CL HREF="/boxes/TEX/TEX201606210.shtml">Tue, Jun 21</a>   <A CLASS=CL HREF="/teams/CIN/2016_sched.shtml">CIN</A>  L   2-8       46-26  1st  9.5 up
 71 <A CLASS=CL HREF="/boxes/TEX/TEX201606200.shtml">Mon, Jun 20</a>   <A CLASS=CL HREF="/teams/BAL/2016_sched.shtml">BAL</A>  W   4-3       46-25  1st  9.5 up
 70 <A CLASS=CL HREF="/boxes/SLN/SLN201606190.shtml">Sun, Jun 19</a>  @<A CLASS=CL HREF="/teams/STL/2016_sched.shtml">STL</A>  W   5-4       45-25  1st  8.5 up
<b>Last 10:</b> 7-3    <b>Last 20:</b>15-5    <b>Last 30:</b>23-7 
</pre></div>

有誰知道如何使用Selenium Python獲取Last 10 Last 20和Last 30中的信息?

結果應為7-3、15-5和23-7

該HTML是...某種東西。 您想要的文本不在任何本地化標簽內。 您將必須獲取外部DIV所有文本才能找到所需的內容。 您可以使用正則表達式或只解析它。 下面的代碼應關閉。

alltext = driver.find_element_by_tag_name("div").text // locator needs to be more specific
results = re.findall('(Last \d{2}:\s*\d+-\d+)', alltext)
print results

正則表達式正在尋找“最后一個” + 2位數字+“:” + 0或更多空格+ 1或更多數字+“-” + 1或更多數字。 findall()將在字符串中返回該正則表達式的所有實例,因此它應返回所有三個。

Python正則表達式信息

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM