简体   繁体   中英

Python Selenium only getting first row when iterating over table

I am trying to extract the most recent headlines from the following news site: http://news.sina.com.cn/hotnews/

#save ids of relevant buttons that need to be clicked on the site
buttons_ids = ['Tab21' , 'Tab22', 'Tab32']

#save ids of relevant subsections
con_ids = ['Con11']

#start webdriver, go to site, hover over buttons
driver = webdriver.Chrome()
driver.get("http://news.sina.com.cn/hotnews/")
time.sleep(3)
for button_id in buttons_ids:
    button = driver.find_element_by_id(button_id)
    ActionChains(driver).move_to_element(button).perform()

Then I iterate through each section that I am interested in and within each section through all the headlines which are rows in an HTML table. However, on every iteration, it returns the first element

for con_id in con_ids:
    for news_id in range(2,10):
        print(news_id)
        headline = driver.find_element_by_xpath("//div[@id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]")
        text = headline.find_element_by_xpath("//td[2]/a")
        print(text.get_attribute("innerText"))
        print(text.get_attribute("href"))
        com_no = comment.find_element_by_xpath("//td[3]/a")
        print(com_no.get_attribute("innerText"))

I also tried the following approach by essentially saving the table as a list and then iterating through the rows:

for con_id in con_ids:
    table = driver.find_elements_by_xpath("//div[@id='"+con_id+"']/table/tbody/tr")
    for headline in table:
        text = headline.find_element_by_xpath("//td[2]/a")
        print(text.get_attribute("innerText"))
        print(text.get_attribute("href"))
        com_no = comment.find_element_by_xpath("//td[3]/a")
        print(com_no.get_attribute("innerText"))

In the second case I get exactly the number of headlines in the section, so it apparently correctly picks up the number of rows. However, it is still only returning the first row on all iterations. Where am I going wrong? I know a similar question has been asked here: Selenium Python iterate over a table of rows it is stopping at the first row but I am still unable to figure out where I am going wrong.

In XPath, queries that begin with // will search relative to the document root; so even though you're calling find_element_by_xpath() on the correct container element, you're breaking out of that scope, thereby performing the same global search and yielding the same result every time.

To constrain your query to descendants of the current element, begin your query with .// , eg,:

text = headline.find_element_by_xpath(".//td[2]/a")

try this:

for con_id in con_ids:
    for news_id in range(2,10):
        print(news_id)
        print("(//div[@id='"+con_id+"']/table/tbody/tr)["+str(news_id)+"]")
        headline = driver.find_element_by_xpath("(//div[@id='"+con_id+"']/table/tbody/tr)["+str(news_id)+"]")
        value = headline.find_element_by_xpath(".//td[2]/a")
        print(value.get_attribute("innerText").encode('utf-8'))

I am able to get the headlines with above code

I was able to solve it by specifying the entire XPath in one go like this:

headline = driver.find_element_by_xpath("(//*[@id='"+con_id+"']/table/tbody/tr["+str(news_id)+"]/td[2]/a)")
print(headline.get_attribute("innerText"))
print(headline.get_attribute("href"))

rather than splitting it into two parts. My only explanation for why it only prints the first row repeatedly is that there is some weird Javascript at work that doesn't let you iterate properly when splitting the request. Or my first version had a syntax error, which I am not aware of. If anyone has a better explanation, I'd be glad to hear it!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM