简体   繁体   English

无法通过 Xpath 使用 Python 和 Selenium 获取元素

[英]Can't get elements with Python and Selenium by Xpath

I am trying to get the "job-title " and the "href" of this webpage with python and selenium.我正在尝试使用 python 和 selenium 获取此网页的“职位名称”和“href”。

It only returns me blanks and no data.它只返回空白而没有数据。

job_card = driver.find_elements_by_xpath('//div[contains(@class,"job-info-wrapper ")]')
    
for job in job_card:
   
                              
    try:
        title = job.find_elements_by_xpath('.//a[contains(@class, "job-title ")]')
    except:
        title = job.find_elements_by_xpath('.//a[contains(@class, "job-title ")]').get_attribute(name="job-title ")
    titles.append(title)
    print(title)
   
    links.append(job.get_attribute(name="a href"))

this is the webpage:这是网页:

在此处输入图片说明

What I am doing wrong here?我在这里做错了什么?

As per the DOM, the title of the job is the text contained in the tag a .根据 DOM,作业的标题是包含在标签a的文本。

Use .get_attribute("innerText") or .text to get the title from the job option.使用.get_attribute("innerText").text从作业选项中获取标题。

And to retrieve the href attribute from the element use .get_attribute("href")并从元素使用.get_attribute("href")检索href属性

And to find an element use - find_element instead of find_elements .并找到一个元素使用 - find_element而不是find_elements find_elements will return a list of webelements. find_elements将返回一个 webelements 列表。

Try like below.尝试如下。

driver.get("https://www.vietnamworks.com/job-search/all-jobs?filtered=true")

wait = WebDriverWait(driver,30)

try:
    wait.until(EC.element_to_be_clickable((By.XPATH,"//div[@class='sc-fznWqX dAkvW']//*[name()='svg' and @class='filter-close']"))).click()
except:
    print("No pop-up")
titles = []
links = []
job_card = driver.find_elements_by_xpath('//div[contains(@class,"job-info-wrapper ")]')

for job in job_card:
    element = job.find_element_by_xpath(".//a[contains(@class,'job-title')]")
    title = element.get_attribute("innerText")
    link = element.get_attribute("href")
    print(f"{title} : {link}")
No pop-up
Chuyên Viên Triển Khai Phần Mềm ERP / ERP Specialist(NEW) : https://www.vietnamworks.com/chuyen-vien-trien-khai-phan-mem-erp-erp-specialist-1438267-jd/?source=searchResults&searchType=2&placement=1438268&sortBy=date
[HN] Data Engineer(NEW) : https://www.vietnamworks.com/hn-data-engineer-2-1431499-jd/?source=searchResults&searchType=2&placement=1431500&sortBy=date
Chuyên Viên Pháp Chế(NEW) : https://www.vietnamworks.com/chuyen-vien-phap-che-510-1-1429155-jd/?source=searchResults&searchType=2&placement=1429156&sortBy=date
...

So when you have the job card just append it's href and innertext.因此,当您拥有工作卡时,只需附加它的 href 和 innertext。 Also the next page should be unindented.下一页也应该是无缩进的。 Also errors would be to use waits to catch any popups at first.错误也是首先使用等待来捕获任何弹出窗口。

wait=WebDriverWait(driver, 10)

driver.get('https://www.vietnamworks.com/job-search/all-jobs?filtered=true')

titles=[]
links =[]

###########################################################################################
# Click search Button 
try:
    wait.until(EC.element_to_be_clickable((By.XPATH, '//a[contains(@class, "button searchBar__button")]'))).click()
except:
    pass

try:
    wait.until(EC.element_to_be_clickable((By.XPATH, '//a[contains(@class, "button searchBar__button")]'))).click()
except:
    pass

###########################################################################################
#loop

for i in range(0,20):
    
    job_card = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[contains(@class,'job-info-wrapper ')]//a[@class='job-title priorityJob']")))
    print(len(job_card))
    for job in job_card:
        links.append(job.get_attribute("href"))
        titles.append(job.text)
        print(job.get_attribute("href"),job.text)

    try:
        wait.until(EC.element_to_be_clickable((By.XPATH, "//a[@class='page-link' and .='>']"))).click()      
    except NoSuchElementException:
        break


print("Page: {}".format(str(i+2)))           
    
df_da=pd.DataFrame()
df_da['Title']=titles
df_da['Link']=links
    
        
print(df_da)

Outputs输出

                                                Title                                               Link
0           QC Engineers (Tester, QA QC, Manual)(NEW)  https://www.vietnamworks.com/qc-engineers-test...
1                 Business Analyst (IT Industry)(NEW)  https://www.vietnamworks.com/business-analyst-...
2    Unity Game Developer (Up to 40,000,000 VNĐ)(NEW)  https://www.vietnamworks.com/unity-game-develo...
3   3D Modeler ( Background Modeler ) - Up to 30,0...  https://www.vietnamworks.com/3d-modeler-backgr...
4   Chuyên Viên Quản Trị Hệ Thống Công Nghệ Thông ...  https://www.vietnamworks.com/chuyen-vien-quan-...
5                              Financial Analyst(NEW)  https://www.vietnamworks.com/financial-analyst...
6   Chuyên Viên Cao Cấp Tuyển Dụng (Nghỉ Thứ 7 Và ...  https://www.vietnamworks.com/chuyen-vien-cao-c...
7                  Chuyên Viên Cao Cấp Tài Chính(NEW)  https://www.vietnamworks.com/chuyen-vien-cao-c...
8                    Trưởng Ban Kiểm Toán Nội Bộ(NEW)  https://www.vietnamworks.com/truong-ban-kiem-t...
9                         Dealer Operation Staff(NEW)  https://www.vietnamworks.com/dealer-operation-...
10          Supervisor - Tiếng Nhật - Phòng Sale(NEW)  https://www.vietnamworks.com/supervisor-tieng-...
11             IT Manager – Back Office Division(NEW)  https://www.vietnamworks.com/it-manager-back-o...
12  Hot Job - Nhân Viên Xuất Nhập Khẩu (Lương Thưở...  https://www.vietnamworks.com/hot-job-nhan-vien...
13  General Accountant for Luxury Brand - Attracti...  https://www.vietnamworks.com/general-accountan...
14   Chuyên Viên Kinh Doanh (Thương Mại Điện Tử)(NEW)  https://www.vietnamworks.com/chuyen-vien-kinh-...
15  Java Developer (Thu Nhập Tương Đương Từ 14 - 2...  https://www.vietnamworks.com/java-developer-th...
16  Logistics Executive (Salary up to 500$ Per mon...  https://www.vietnamworks.com/logistics-executi...
17  Customs Liquidation & Customs Declaration Staf...  https://www.vietnamworks.com/customs-liquidati...
18  Chuyên Viên Kinh Doanh Thiết Bị Y Tế - [Mức Lư...  https://www.vietnamworks.com/chuyen-vien-kinh-...
19                       Trade Operation Officer(NEW)  https://www.vietnamworks.com/trade-operation-o...
20                 Nhân Viên PR - Quản Lý Đô Thị(NEW)  https://www.vietnamworks.com/nhan-vien-pr-quan...

You were almost there.你快到了。 You just need two minor modification as follows:你只需要两个小的修改如下:

  • get_attribute() is an attribute of a WebElement. get_attribute()是 WebElement 的属性。 So instead of find_elements* you need to use find_element*因此,而不是find_elements*您需要使用find_element*
  • Within get_attribute() you just need to pass the attribute name as get_attribute("class")get_attribute()您只需要将属性名称作为get_attribute("class")传递

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM