简体   繁体   English

遍历表时,Python Selenium仅获得第一行

[英]Python Selenium only getting first row when iterating over table

I am trying to extract the most recent headlines from the following news site: http://news.sina.com.cn/hotnews/ 我正在尝试从以下新闻站点中提取最新的头条新闻: http : //news.sina.com.cn/hotnews/

#save ids of relevant buttons that need to be clicked on the site
buttons_ids = ['Tab21' , 'Tab22', 'Tab32']

#save ids of relevant subsections
con_ids = ['Con11']

#start webdriver, go to site, hover over buttons
driver = webdriver.Chrome()
driver.get("http://news.sina.com.cn/hotnews/")
time.sleep(3)
for button_id in buttons_ids:
    button = driver.find_element_by_id(button_id)
    ActionChains(driver).move_to_element(button).perform()

Then I iterate through each section that I am interested in and within each section through all the headlines which are rows in an HTML table. 然后,我遍历我感兴趣的每个部分,并遍历HTML表格中所有标题的每个部分。 However, on every iteration, it returns the first element 但是,在每次迭代中,它都会返回第一个元素

for con_id in con_ids:
    for news_id in range(2,10):
        print(news_id)
        headline = driver.find_element_by_xpath("//div[@id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]")
        text = headline.find_element_by_xpath("//td[2]/a")
        print(text.get_attribute("innerText"))
        print(text.get_attribute("href"))
        com_no = comment.find_element_by_xpath("//td[3]/a")
        print(com_no.get_attribute("innerText"))

I also tried the following approach by essentially saving the table as a list and then iterating through the rows: 我还尝试了以下方法,实际上是将表另存为列表,然后遍历各行:

for con_id in con_ids:
    table = driver.find_elements_by_xpath("//div[@id='"+con_id+"']/table/tbody/tr")
    for headline in table:
        text = headline.find_element_by_xpath("//td[2]/a")
        print(text.get_attribute("innerText"))
        print(text.get_attribute("href"))
        com_no = comment.find_element_by_xpath("//td[3]/a")
        print(com_no.get_attribute("innerText"))

In the second case I get exactly the number of headlines in the section, so it apparently correctly picks up the number of rows. 在第二种情况下,我确切地获得了该节中标题的数量,因此显然可以正确地选择行数。 However, it is still only returning the first row on all iterations. 但是,它仍然仅在所有迭代中返回第一行。 Where am I going wrong? 我要去哪里错了? I know a similar question has been asked here: Selenium Python iterate over a table of rows it is stopping at the first row but I am still unable to figure out where I am going wrong. 我知道在这里也曾问过类似的问题: Selenium Python遍历了在第一行停止的行表,但我仍然无法弄清楚我要去哪里。

In XPath, queries that begin with // will search relative to the document root; 在XPath中,以//开头的查询将相对于文档根进行搜索; so even though you're calling find_element_by_xpath() on the correct container element, you're breaking out of that scope, thereby performing the same global search and yielding the same result every time. 因此,即使您在正确的容器元素上调用find_element_by_xpath() ,您也超出了该范围,从而执行相同的全局搜索并每次产生相同的结果。

To constrain your query to descendants of the current element, begin your query with .// , eg,: 要将查询限制为当前元素的后代,请以.//开头查询,例如:

text = headline.find_element_by_xpath(".//td[2]/a")

try this: 尝试这个:

for con_id in con_ids:
    for news_id in range(2,10):
        print(news_id)
        print("(//div[@id='"+con_id+"']/table/tbody/tr)["+str(news_id)+"]")
        headline = driver.find_element_by_xpath("(//div[@id='"+con_id+"']/table/tbody/tr)["+str(news_id)+"]")
        value = headline.find_element_by_xpath(".//td[2]/a")
        print(value.get_attribute("innerText").encode('utf-8'))

I am able to get the headlines with above code 我可以通过上面的代码获得头条新闻

I was able to solve it by specifying the entire XPath in one go like this: 我能够通过一次指定整个XPath来解决此问题:

headline = driver.find_element_by_xpath("(//*[@id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]/td[2]/a)")
print(headline.get_attribute("innerText"))
print(headline.get_attribute("href"))

rather than splitting it into two parts. 而不是将其分为两部分。 My only explanation for why it only prints the first row repeatedly is that there is some weird Javascript at work that doesn't let you iterate properly when splitting the request. 关于为什么只重复打印第一行的唯一解释是,工作中有一些奇怪的Javascript无法在拆分请求时正确进行迭代。 Or my first version had a syntax error, which I am not aware of. 或者我的第一个版本有语法错误,我不知道。 If anyone has a better explanation, I'd be glad to hear it! 如果有人有更好的解释,我将很高兴听到它!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 selenium 中迭代 webelements 总是只返回表格的第一行 - iterating over webelements in selenium always returns me only first row of the table 在python中使用Selenium进行抓取时,只能获得表中的第一行 - Can only get first row in table when scraping with Selenium in python Selenium Python遍历在第一行停止的行表 - Selenium Python iterate over a table of rows it is stopping at the first row Python-Selenium table-scraper 只返回第一行 - Python-Selenium table-scraper only returns first row Python Selenium 遍历表格并相应地点击每一行 - Python Selenium iterating through table and click correspondingly each row Python/Selenium - 迭代到下一行 - Python/Selenium - Iterating to next row 迭代Python中的第一次迭代 - Iterating over first iteration in Python Python BeautifulSoup:遍历表 - Python BeautifulSoup: Iterating over a table 即使我正在遍历行,Selenium 也会选择表中的第一行 - Selenium selecting first row in a table even though I'm iterating through the rows 遍历单击以查找包含链接的表单元格,并通过链接文本查找它,同时使用Selenium和python抓取数据 - Iterating over click for the table cells containing the link and finding it by link text while scraping data using selenium and python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM