简体   繁体   English

硒,蟒蛇网页抓取

[英]Selenium, pythons web scraping

I am trying to extract data from HTML table.我正在尝试从 HTML 表中提取数据。 Successfully counted the row but when I'm printing it keeps repeating the row.成功计算了该行,但是当我打印时,它不断重复该行。 Can anyone please tell me what is wrong in code?谁能告诉我代码有什么问题? Thanks.谢谢。

#counting length of row
rows = len(driver.find_elements_by_xpath('/html/body/form/fieldset/table[2]/tbody/tr/td[3]/table/tbody/tr[5]/td[2]/div/table[1]/tbody/tr[2]/td[1]/table[2]/tbody/tr'))
time.sleep(2)
print(rows)

for r in range(rows):
    value=driver.find_element_by_xpath('/html/body/form/fieldset/table[2]/tbody/tr/td[3]/table/tbody/tr[5]/td[2]/div/table[1]/tbody/tr[2]/td[1]/table[2]/tbody/tr["+str(r)+"]')
    print(value.text)


#Output:
18 #no of rows
Start of legal relation2/7/2018 #1st row
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
Start of legal relation2/7/2018
sample test case successfully completed

Without a URL you have supplied, it's difficult to say why.如果没有您提供的 URL,很难说出原因。 However, the first tr element would be [1] , so I think your range function should be range(1, rows + 1) .但是,第一个tr元素将是[1] ,所以我认为您的range函数应该是range(1, rows + 1) And the way you are going about this seems very indirect since your first query seems to have retrieved all the elements you are looking for.您处理此问题的方式似乎非常间接,因为您的第一个查询似乎已检索到您要查找的所有元素。 So why not just the following?那么为什么不只是以下呢?

elements = driver.find_elements_by_xpath('/html/body/form/fieldset/table[2]/tbody/tr/td[3]/table/tbody/tr[5]/td[2]/div/table[1]/tbody/tr[2]/td[1]/table[2]/tbody/tr')
#time.sleep(2) # what does this accomplish?
print(len(elements))

text_list = [element.text for element in elements] # list of strings

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM