簡體   English   中英

如何使用 selenium 和 xpath 獲取這些段落的文本

[英]how to get the text of these paragraphs using selenium and xpath

我正在嘗試抓取此網站,在此處輸入鏈接描述 我們在每一頁上都有近十個不同的機會。 每個都有自己的標題和詳細信息。 我想得到所有這些信息。 我編寫了一個 python 代碼,可以找到其他必需的標簽和信息,但我找不到其中包含描述的段落。

這是我的代碼。

    base_url = "https://www.enabel.be/content/enabel-tenders"
    driver.get(base_url)
    WebDriverWait(driver , 10).until(EC.visibility_of_element_located(
            (By.XPATH , "//*[@id='block-views-tenders-block']/div/div/div[@class='view-content']/div")))

    current_page_tag = driver.find_element(By.XPATH ,
                                               "//*[@id='block-views-tenders-block']/div/div/div[3]/ul/li[2]").text.strip()
    all_divs = driver.find_elements(By.XPATH ,
                                        "//*[@id='block-views-tenders-block']/div/div/div[@class ='view-content' "
                                        "]/div")


      for each_div in all_divs :

            singleData = {
                # could not detect
                "language" : 107 ,
                # means open
                "status" : 0 ,
                "op_link" : "" ,
                "website" : website_name ,
                "close_date" : '' ,
                # means not available
                "organization" : website_name ,
                "description" : "" ,
                "title" : '' ,
                "checksum" : "" ,
                # means not available
                "country" : '' ,
                "published_date" : ''
            }

            singleData['title'] = each_div.find_element(By.XPATH ,
                                                        ".//span[@class='title-accr no-transform']").text.strip()
    
                singleData['country'] = each_div.find_element(By.XPATH ,
                                                              ".//div[1]/div/div/div[@class ='field-items']/div").text.strip()
                close_date = each_div.find_element(By.XPATH , ".//div//div[1]/div").text.strip()
    
                 #description always returns me empty text.
                description = each_div.find_element(By.XPATH, ".//div/div[2]/div[3]/div[2]/div/p").text.strip()
                download = each_div.find_elements_by_xpath('.//div//div[2]/div[4]/div[2]//a')
                download_file_link = []
                for eachfile in download :
                    download_file_link.append(eachfile.get_attribute('href'))

我的代碼可以獲取標題、國家、截止日期及其附件,但無法獲取描述部分。 它返回給我一個空文本,但是當我在網站上看到它時,它里面有文本。

任何人都可以幫助我解決問題和解決方案。 提前致謝

在此處輸入圖像描述

使用 try except 來捕捉它,如果它在那里。有一些  所以可能需要刪除它。

for each_div in all_divs :
     #description always returns me empty text.
    try:
        description = each_div.find_element(By.XPATH, ".//div[contains(text(),'Description')]/parent::div/div[2]//p[1]").get_attribute('innerHTML')
        print(description)
    except:
        print('none')

輸出

This is the annual publication of information on recipients of funds for the TVET Project. 
none
At the latest 14 calendar days before the final date for receipt of tenders (up to 4th January 2021), tenderers may ask questions about the tender documents and the contract in accordance with Art. 64 of the Law of 17 June 2016. Questions shall be addressed in writing to:
Pour tout besoin d'information complémentaire, veuillez contacter: <a href="mailto:adama.dianda@enabel.be">adama.dianda@enabel.be</a>
none
none
none
Marché relatif &nbsp;à &nbsp;la&nbsp;fourniture, &nbsp;l’installation, &nbsp;la &nbsp;mise &nbsp;en &nbsp;marche &nbsp;et&nbsp;formation des utilisateurs et techniciens chargé de la&nbsp;maintenance &nbsp;des &nbsp;équipements &nbsp;de &nbsp;Laboratoire&nbsp;destinés au CERMES.&nbsp;
Pour tout besoin d'information complémentaire, veuillez contacter: <a href="mailto:adama.dianda@enabel.be">adama.dianda@enabel.be</a>
Tenders should request the price schedule in xls from Ms. Eva Matovu. email: <a href="mailto:eva.matovu@enabel.be">eva.matovu@enabel.be</a>

你可以使用

for each_div in all_divs :
     #description always returns me empty text.
    try:
        description = each_div.find_elements(By.XPATH, ".//div[contains(text(),'Description')]/parent::div/div[2]//p")
        for desc in description:
            print(desc.get_attribute('textContent'))
    except:
        print('none')

輸出

This is the annual publication of information on recipients of funds for the TVET Project.
At the latest 14 calendar days before the final date for receipt of tenders (up to 4th January 2021), tenderers may ask questions about the tender documents and the contract in accordance with Art. 64 of the Law of 17 June 2016. Questions shall be addressed in writing to:
Françoise MUSHIMIYIMANA, National Expert in Contractualization & Administration _National ECA                                    (francoise.mushimiyimana@enabel.be ), with copy to
denise.nsanga@enabel.be
evariste.sibomana@enabel.be

They shall be answered in the order received. The complete overview of questions asked shall be available as of at the latest 7 calendar days before the final date for receipt of tenders at the address mentioned above.
Pour tout besoin d'information complémentaire, veuillez contacter: adama.dianda@enabel.be
Marché relatif  à  la fourniture,  l’installation,  la  mise  en  marche  et formation des utilisateurs et techniciens chargé de la maintenance  des  équipements  de  Laboratoire destinés au CERMES.
Pour tout besoin d'information complémentaire, veuillez contacter: adama.dianda@enabel.be
Tenders should request the price schedule in xls from Ms. Eva Matovu. email: eva.matovu@enabel.be

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM