简体   繁体   中英

how to get the text of these paragraphs using selenium and xpath

I am trying to scrape this website enter link description here . we have almost ten different opportunities on each page. each one has its own title and details. I want to get all this information. I have written a python code that can locate other required tags and information but I can't locate the paragraphs that contain a description in it.

here is my code.

    base_url = "https://www.enabel.be/content/enabel-tenders"
    driver.get(base_url)
    WebDriverWait(driver , 10).until(EC.visibility_of_element_located(
            (By.XPATH , "//*[@id='block-views-tenders-block']/div/div/div[@class='view-content']/div")))

    current_page_tag = driver.find_element(By.XPATH ,
                                               "//*[@id='block-views-tenders-block']/div/div/div[3]/ul/li[2]").text.strip()
    all_divs = driver.find_elements(By.XPATH ,
                                        "//*[@id='block-views-tenders-block']/div/div/div[@class ='view-content' "
                                        "]/div")


      for each_div in all_divs :

            singleData = {
                # could not detect
                "language" : 107 ,
                # means open
                "status" : 0 ,
                "op_link" : "" ,
                "website" : website_name ,
                "close_date" : '' ,
                # means not available
                "organization" : website_name ,
                "description" : "" ,
                "title" : '' ,
                "checksum" : "" ,
                # means not available
                "country" : '' ,
                "published_date" : ''
            }

            singleData['title'] = each_div.find_element(By.XPATH ,
                                                        ".//span[@class='title-accr no-transform']").text.strip()
    
                singleData['country'] = each_div.find_element(By.XPATH ,
                                                              ".//div[1]/div/div/div[@class ='field-items']/div").text.strip()
                close_date = each_div.find_element(By.XPATH , ".//div//div[1]/div").text.strip()
    
                 #description always returns me empty text.
                description = each_div.find_element(By.XPATH, ".//div/div[2]/div[3]/div[2]/div/p").text.strip()
                download = each_div.find_elements_by_xpath('.//div//div[2]/div[4]/div[2]//a')
                download_file_link = []
                for eachfile in download :
                    download_file_link.append(eachfile.get_attribute('href'))

my code can get the title, country, deadline, and its attachment but can't get the description part. it returns me an empty text but when I see it on the website it has text in it.

can anyone help me with the issue and solution. thanks in advance

在此处输入图像描述

Use a try except to catch it if it's there.There's some   so might need to remove it.

for each_div in all_divs :
     #description always returns me empty text.
    try:
        description = each_div.find_element(By.XPATH, ".//div[contains(text(),'Description')]/parent::div/div[2]//p[1]").get_attribute('innerHTML')
        print(description)
    except:
        print('none')

Outputs

This is the annual publication of information on recipients of funds for the TVET Project. 
none
At the latest 14 calendar days before the final date for receipt of tenders (up to 4th January 2021), tenderers may ask questions about the tender documents and the contract in accordance with Art. 64 of the Law of 17 June 2016. Questions shall be addressed in writing to:
Pour tout besoin d'information complémentaire, veuillez contacter: <a href="mailto:adama.dianda@enabel.be">adama.dianda@enabel.be</a>
none
none
none
Marché relatif &nbsp;à &nbsp;la&nbsp;fourniture, &nbsp;l’installation, &nbsp;la &nbsp;mise &nbsp;en &nbsp;marche &nbsp;et&nbsp;formation des utilisateurs et techniciens chargé de la&nbsp;maintenance &nbsp;des &nbsp;équipements &nbsp;de &nbsp;Laboratoire&nbsp;destinés au CERMES.&nbsp;
Pour tout besoin d'information complémentaire, veuillez contacter: <a href="mailto:adama.dianda@enabel.be">adama.dianda@enabel.be</a>
Tenders should request the price schedule in xls from Ms. Eva Matovu. email: <a href="mailto:eva.matovu@enabel.be">eva.matovu@enabel.be</a>

You could use

for each_div in all_divs :
     #description always returns me empty text.
    try:
        description = each_div.find_elements(By.XPATH, ".//div[contains(text(),'Description')]/parent::div/div[2]//p")
        for desc in description:
            print(desc.get_attribute('textContent'))
    except:
        print('none')

Outputs

This is the annual publication of information on recipients of funds for the TVET Project.
At the latest 14 calendar days before the final date for receipt of tenders (up to 4th January 2021), tenderers may ask questions about the tender documents and the contract in accordance with Art. 64 of the Law of 17 June 2016. Questions shall be addressed in writing to:
Françoise MUSHIMIYIMANA, National Expert in Contractualization & Administration _National ECA                                    (francoise.mushimiyimana@enabel.be ), with copy to
denise.nsanga@enabel.be
evariste.sibomana@enabel.be

They shall be answered in the order received. The complete overview of questions asked shall be available as of at the latest 7 calendar days before the final date for receipt of tenders at the address mentioned above.
Pour tout besoin d'information complémentaire, veuillez contacter: adama.dianda@enabel.be
Marché relatif  à  la fourniture,  l’installation,  la  mise  en  marche  et formation des utilisateurs et techniciens chargé de la maintenance  des  équipements  de  Laboratoire destinés au CERMES.
Pour tout besoin d'information complémentaire, veuillez contacter: adama.dianda@enabel.be
Tenders should request the price schedule in xls from Ms. Eva Matovu. email: eva.matovu@enabel.be

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM