逐行刮表？ Python 網頁抓取

Question

如果您訪問此鏈接： https : //www.halton.ca/For-Residents/Food-Safety/Dinewise/Search-Directory-of-Food-Premises-Dinewise並單擊餐廳並查看此頁面。

我想抓取此表中的所有信息。

起初我嘗試使用：

driver.find_element_by_xpath('//*[@id="Form1"]/table[1]').text)

這確實獲得了我的表格部分信息。 但是，因為在“滿意”下，“檢查時更正”和“不適用”是圖像。 這種方法不允許我獲取它的文本。 因此我在考慮我可以獲取源鏈接並確定它是是還是否。

我的問題是如何逐行抓取這張表？ 這是我的嘗試，我無法刮取“描述”部分

 # Get Areas of Assessment, Description then Satisfactory
        table =  driver.find_element_by_xpath('//*[@id="Form1"]/table[1]/tbody')
        rows = table.find_elements_by_tag_name("tr")  # get all of the rows in the table
        
        content = []
        
        for row in range(1, len(rows)+1):
            # Get the columns (all the column 2)
            
            #areas of assessment
            a = rows[row].find_element_by_xpath('//*[@id="Form1"]/table[1]/tbody/tr[%s]/td[1]/b'%row).text
            content.append(a)
            print(content)
            print(a)
            #description
            b =rows[row].find_element_by_xpath('//*[@id="Form1"]/table[1]/tbody/tr[1]/td[2]/br').text
            print(b)
            #satisfactory
#             test =rows[row].find_element_by_id('chkFoodProtectedFromContamination_Satisfactory').get_attribute("src")
            print(test)

Answer 1

只需執行以下操作即可檢索圖像的 src。

c = rows[row].find_element_by_xpath('//*[@id="Form1"]/table[1]/tbody/tr[%s]/td[3]/img').getAttribute("src")
print(c)

逐行刮表？ Python 網頁抓取

問題描述

1 個解決方案

解決方案1
0 已采納 2020-10-03 06:01:57

逐行刮表？ Python 網頁抓取

問題描述

1 個解決方案

解決方案1 0 已采納 2020-10-03 06:01:57

解決方案1
0 已采納 2020-10-03 06:01:57