遍历表时，Python Selenium仅获得第一行

Question

I am trying to extract the most recent headlines from the following news site: http://news.sina.com.cn/hotnews/ 我正在尝试从以下新闻站点中提取最新的头条新闻： http : //news.sina.com.cn/hotnews/

#save ids of relevant buttons that need to be clicked on the site
buttons_ids = ['Tab21' , 'Tab22', 'Tab32']

#save ids of relevant subsections
con_ids = ['Con11']

#start webdriver, go to site, hover over buttons
driver = webdriver.Chrome()
driver.get("http://news.sina.com.cn/hotnews/")
time.sleep(3)
for button_id in buttons_ids:
    button = driver.find_element_by_id(button_id)
    ActionChains(driver).move_to_element(button).perform()

Then I iterate through each section that I am interested in and within each section through all the headlines which are rows in an HTML table. 然后，我遍历我感兴趣的每个部分，并遍历HTML表格中所有标题的每个部分。 However, on every iteration, it returns the first element 但是，在每次迭代中，它都会返回第一个元素

for con_id in con_ids:
    for news_id in range(2,10):
        print(news_id)
        headline = driver.find_element_by_xpath("//div[@id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]")
        text = headline.find_element_by_xpath("//td[2]/a")
        print(text.get_attribute("innerText"))
        print(text.get_attribute("href"))
        com_no = comment.find_element_by_xpath("//td[3]/a")
        print(com_no.get_attribute("innerText"))

I also tried the following approach by essentially saving the table as a list and then iterating through the rows: 我还尝试了以下方法，实际上是将表另存为列表，然后遍历各行：

for con_id in con_ids:
    table = driver.find_elements_by_xpath("//div[@id='"+con_id+"']/table/tbody/tr")
    for headline in table:
        text = headline.find_element_by_xpath("//td[2]/a")
        print(text.get_attribute("innerText"))
        print(text.get_attribute("href"))
        com_no = comment.find_element_by_xpath("//td[3]/a")
        print(com_no.get_attribute("innerText"))

In the second case I get exactly the number of headlines in the section, so it apparently correctly picks up the number of rows. 在第二种情况下，我确切地获得了该节中标题的数量，因此显然可以正确地选择行数。 However, it is still only returning the first row on all iterations. 但是，它仍然仅在所有迭代中返回第一行。 Where am I going wrong? 我要去哪里错了？ I know a similar question has been asked here: Selenium Python iterate over a table of rows it is stopping at the first row but I am still unable to figure out where I am going wrong. 我知道在这里也曾问过类似的问题： Selenium Python遍历了在第一行停止的行表，但我仍然无法弄清楚我要去哪里。

Answer 1

In XPath, queries that begin with // will search relative to the document root; 在XPath中，以//开头的查询将相对于文档根进行搜索； so even though you're calling find_element_by_xpath() on the correct container element, you're breaking out of that scope, thereby performing the same global search and yielding the same result every time. 因此，即使您在正确的容器元素上调用find_element_by_xpath() ，您也超出了该范围，从而执行相同的全局搜索并每次产生相同的结果。

To constrain your query to descendants of the current element, begin your query with .// , eg,: 要将查询限制为当前元素的后代，请以.//开头查询，例如：

text = headline.find_element_by_xpath(".//td[2]/a")

Answer 2

try this: 尝试这个：

for con_id in con_ids:
    for news_id in range(2,10):
        print(news_id)
        print("(//div[@id='"+con_id+"']/table/tbody/tr)["+str(news_id)+"]")
        headline = driver.find_element_by_xpath("(//div[@id='"+con_id+"']/table/tbody/tr)["+str(news_id)+"]")
        value = headline.find_element_by_xpath(".//td[2]/a")
        print(value.get_attribute("innerText").encode('utf-8'))

I am able to get the headlines with above code 我可以通过上面的代码获得头条新闻

Answer 3

I was able to solve it by specifying the entire XPath in one go like this: 我能够通过一次指定整个XPath来解决此问题：

headline = driver.find_element_by_xpath("(//*[@id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]/td[2]/a)")
print(headline.get_attribute("innerText"))
print(headline.get_attribute("href"))

rather than splitting it into two parts. 而不是将其分为两部分。 My only explanation for why it only prints the first row repeatedly is that there is some weird Javascript at work that doesn't let you iterate properly when splitting the request. 关于为什么只重复打印第一行的唯一解释是，工作中有一些奇怪的Javascript无法在拆分请求时正确进行迭代。 Or my first version had a syntax error, which I am not aware of. 或者我的第一个版本有语法错误，我不知道。 If anyone has a better explanation, I'd be glad to hear it! 如果有人有更好的解释，我将很高兴听到它！

遍历表时，Python Selenium仅获得第一行

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-02-16 04:52:14

解决方案2
1 2018-02-15 18:18:18

解决方案3
0 2018-02-15 20:49:03

遍历表时，Python Selenium仅获得第一行

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-02-16 04:52:14

解决方案2 1 2018-02-15 18:18:18

解决方案3 0 2018-02-15 20:49:03

解决方案1
3 已采纳 2018-02-16 04:52:14

解决方案2
1 2018-02-15 18:18:18

解决方案3
0 2018-02-15 20:49:03