简体   繁体   English

在python selenium中从HTML表格中提取链接文本

[英]Extracting link text from HTML Table in python selenium

I'm new to selenium.我是硒的新手。 I wanted to extract the link text from the HTML table from the following HTML code of the website.我想从网站的以下 HTML 代码中提取 HTML 表中的链接文本。

Code Snippet:代码片段:

<div style="width:210px" id="calenderdiv">
    <table id="calender" align="center" bgcolor="#ABABAB" width="90%" cellspacing="1" cellpadding="0" border="0">
    <tbody>
        <tr height="25" bgcolor="#DDDDDD" style="font-family:arial ;font-size:12;font-weight:bold; color: #006699">
            <td align="center" width="14%">S</td>
            <td align="center" width="14%">M</td>
            <td align="center" width="14%">T</td>
            <td align="center" width="14%">W</td>
            <td align="center" width="14%">T</td>
            <td align="center" width="14%">F</td>
            <td align="center" width="14%">S</td>
        </tr>
        <tr height="25" bgcolor="#FFFFFF" style="font-family:arial ;font-size:12;font-weight:bold; color: #006699">
            <td bgcolor="#EFEFEF" align="center"><a href="/2010/1/1/archivelist/year-2010,month-1,starttime-40179.cms"></a>&nbsp; </td>
            <td align="center"><a href="/2010/1/1/archivelist/year-2010,month-1,starttime-40179.cms"></a>&nbsp; </td>
            <td align="center"><a href="/2010/1/1/archivelist/year-2010,month-1,starttime-40179.cms"></a>&nbsp; </td>
            <td align="center"><a href="/2010/1/1/archivelist/year-2010,month-1,starttime-40179.cms"></a>&nbsp; </td>
            <td align="center"><a href="/2010/1/1/archivelist/year-2010,month-1,starttime-40179.cms"></a>&nbsp; </td>
            <td align="center"><a href="/2010/1/1/archivelist/year-2010,month-1,starttime-40179.cms">1</a></td>
            <td align="center"><a href="/2010/1/2/archivelist/year-2010,month-1,starttime-40180.cms">2</a></td>
        </tr>
        <tr height="25" bgcolor="#FFFFFF" style="font-family:arial ;font-size:12;font-weight:bold; color: #006699">
            <td bgcolor="#EFEFEF" align="center"><a href="/2010/1/3/archivelist/year-2010,month-1,starttime-40181.cms">3</a></td>
            <td align="center"><a href="/2010/1/4/archivelist/year-2010,month-1,starttime-40182.cms">4</a></td>
            <td align="center"><a href="/2010/1/5/archivelist/year-2010,month-1,starttime-40183.cms">5</a></td>
            <td align="center"><a href="/2010/1/6/archivelist/year-2010,month-1,starttime-40184.cms">6</a></td>
            <td align="center"><a href="/2010/1/7/archivelist/year-2010,month-1,starttime-40185.cms">7</a></td>
            <td align="center"><a href="/2010/1/8/archivelist/year-2010,month-1,starttime-40186.cms">8</a></td>
            <td align="center"><a href="/2010/1/9/archivelist/year-2010,month-1,starttime-40187.cms">9</a></td>
        </tr>
        <tr height="25" bgcolor="#FFFFFF" style="font-family:arial ;font-size:12;font-weight:bold; color: #006699">
            <td bgcolor="#EFEFEF" align="center"><a href="/2010/1/10/archivelist/year-2010,month-1,starttime-40188.cms">10</a></td>
            <td align="center"><a href="/2010/1/11/archivelist/year-2010,month-1,starttime-40189.cms">11</a></td>
            <td align="center"><a href="/2010/1/12/archivelist/year-2010,month-1,starttime-40190.cms">12</a></td>
            <td align="center"><a href="/2010/1/13/archivelist/year-2010,month-1,starttime-40191.cms">13</a></td>
            <td align="center"><a href="/2010/1/14/archivelist/year-2010,month-1,starttime-40192.cms">14</a></td>
            <td align="center"><a href="/2010/1/15/archivelist/year-2010,month-1,starttime-40193.cms">15</a></td>
            <td align="center"><a href="/2010/1/16/archivelist/year-2010,month-1,starttime-40194.cms">16</a></td>
        </tr>
        <tr height="25" bgcolor="#FFFFFF" style="font-family:arial ;font-size:12;font-weight:bold; color: #006699">
            <td bgcolor="#EFEFEF" align="center"><a href="/2010/1/17/archivelist/year-2010,month-1,starttime-40195.cms">17</a></td>
            <td align="center"><a href="/2010/1/18/archivelist/year-2010,month-1,starttime-40196.cms">18</a></td>
            <td align="center"><a href="/2010/1/19/archivelist/year-2010,month-1,starttime-40197.cms">19</a></td>
            <td align="center"><a href="/2010/1/20/archivelist/year-2010,month-1,starttime-40198.cms">20</a></td>
            <td align="center"><a href="/2010/1/21/archivelist/year-2010,month-1,starttime-40199.cms">21</a></td>
            <td align="center"><a href="/2010/1/22/archivelist/year-2010,month-1,starttime-40200.cms">22</a></td>
            <td align="center"><a href="/2010/1/23/archivelist/year-2010,month-1,starttime-40201.cms">23</a></td>
        </tr>
        <tr height="25" bgcolor="#FFFFFF" style="font-family:arial ;font-size:12;font-weight:bold; color: #006699">
            <td bgcolor="#EFEFEF" align="center"><a href="/2010/1/24/archivelist/year-2010,month-1,starttime-40202.cms">24</a></td>
            <td align="center"><a href="/2010/1/25/archivelist/year-2010,month-1,starttime-40203.cms">25</a></td>
            <td align="center"><a href="/2010/1/26/archivelist/year-2010,month-1,starttime-40204.cms">26</a></td>
            <td align="center"><a href="/2010/1/27/archivelist/year-2010,month-1,starttime-40205.cms">27</a></td>
            <td align="center"><a href="/2010/1/28/archivelist/year-2010,month-1,starttime-40206.cms">28</a></td>
            <td align="center"><a href="/2010/1/29/archivelist/year-2010,month-1,starttime-40207.cms">29</a></td>
            <td align="center"><a href="/2010/1/30/archivelist/year-2010,month-1,starttime-40208.cms">30</a></td>
        </tr>
        <tr height="25" bgcolor="#FFFFFF" style="font-family:arial ;font-size:12;font-weight:bold; color: #006699">
            <td bgcolor="#EFEFEF" align="center"><a href="/2010/1/31/archivelist/year-2010,month-1,starttime-40209.cms">31</a></td>
            <td>&nbsp; </td>
            <td>&nbsp; </td>
            <td>&nbsp; </td>
            <td>&nbsp; </td>
            <td>&nbsp; </td>
            <td>&nbsp; </td>
        </tr>
    </tbody>
</table>

For the above snippet I wrote the following code in selenium:对于上面的代码片段,我在 selenium 中编写了以下代码:

from selenium import webdriver
from selenium.webdriver.common.by import By

option = webdriver.ChromeOptions()
option.add_argument("--incognito")
option.add_argument("--start-maximized")

chrome_path = r"C:\Users\singh\Downloads\chromedriver_win32\chromedriver.exe"
browser = webdriver.Chrome(chrome_path, options=option)
browser.get("https://timesofindia.indiatimes.com/archive/year-2010,month-1.cms")
browser.implicitly_wait(10)

num = []
numbers = browser.find_elements(By.XPATH, "//table[@class = 'calender']/tbody/tr/td/a[@href]")

for n in numbers:
    number = n.text
    num.append(number)

Expected Output:预期输出:

num = ['S', 'M', 'T', 'W', 'T', 'F', 'S', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31']

The program is returning both num and numbers lists as empty lists.该程序将numnumbers列表作为空列表返回。

I know something's not correct in the XPath entered in the code.我知道在代码中输入的 XPath 中有些地方不正确。 But I'm not able to figure out what is the error.但我无法弄清楚错误是什么。

OS: Windows 10 x64操作系统: Windows 10 x64
Python IDE: Anaconda Spyder Python IDE: Anaconda Spyder
Python Version: 3.6 Python版本: 3.6

There are two problems有两个问题

1. calender is id, not class. 1. calender是id,不是class。

2.To get the href you need to use get_attribute , not text 2.要获得href需要使用get_attribute ,而不是 text

numbers = browser.find_elements(By.XPATH, '//table[@id="calender"]//a')

for n in numbers:
    number = n.get_attribute('href')
    num.append(number)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM