繁体   English   中英

为什么我的代码返回重复数据?

[英]Why is my code returning duplicating data?

我正在做一个正在抓取搜索结果的项目。 我正在尝试打印我找到的数据,但第一页的数据是重复的。 它打印第一页的数据两次,页面数据的 rest 打印一次(这是我想要的)。 请让我知道如何防止我的第一页数据两次。 谢谢!

from selenium import webdriver
from time import sleep

page = 0
# SearchTerm = input("Search Term: ")
SearchTerm = "EHS"
# LocationSearch = input("Location: ")
LocationSearch = "Arizona"

NumPages = 4

Data = []

def removeduplicates(listofelements):
    # Create an empty list to store unique elements
    uniquelist = []

    # Iterate over the original list and for each element
    # add it to uniqueList, if its not already there.
    for elem in listofelements:
        if elem not in uniquelist:
            uniquelist.append(elem)

    # Return the list of unique elements
    return uniquelist


Data = []

url = ('https://www.indeed.com/jobs?q=' + SearchTerm + '&l=' + LocationSearch + '&start=0')

driver = webdriver.Chrome("/Users/nzalle/Downloads/chromedriver")
driver.get(url)

for x in range(NumPages + 1):
    driver.get(url)
    url = ('https://www.indeed.com/jobs?q=' + SearchTerm + '&l=' + LocationSearch + '&start=' + str(page))
    page += 10
    # scrape code
    Titles = driver.find_elements_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "jobtitle", " " ))]')
    TitleText = [x.text for x in Titles]

    Data.extend([*TitleText, ","])

    CompanyName = driver.find_elements_by_xpath('//*[contains(concat( " ", @class, " " ), concat( " ", "company", " " ))] | //*[contains(concat( " ", @class, " " ), concat( " ", "company", " " ))]//*[contains(concat( " ", @class, " " ), concat( " ", "turnstileLink", " " ))]')
    CompanyNameText = [x.text for x in CompanyName]

    Data.extend([*CompanyNameText, ",", "\n"])

    sleep(3)

    driver.get(url)

print(*Data)

我在代码中做了一些更改,还 find_elements.Induce WebDriverWait () 并等待visibility_of_all_elements_located ()

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome("/Users/nzalle/Downloads/chromedriver")

page = 0
# SearchTerm = input("Search Term: ")
SearchTerm = "EHS"
# LocationSearch = input("Location: ")
LocationSearch = "Arizona"

NumPages = 4
Data = []
for x in range(NumPages + 1):
    url = ('https://www.indeed.com/jobs?q=' + SearchTerm + '&l=' + LocationSearch + '&start=' + str(page))
    driver.get(url)
    time.sleep(1) #slowdown the loop
    page += 10
    for jobs in WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div[data-tn-component='organicJob']"))):
        title=jobs.find_element_by_xpath("./h2/a").text.strip()
        try:
            name=jobs.find_element_by_xpath(".//span[@class='company']/a").text.strip()
        except:
            name = jobs.find_element_by_xpath(".//span[@class='company']").text.strip()
        Data.append(title)
        Data.append(name)

print(Data)

Output

['EHS and Facilities Manager', 'Howmet Aerospace', 'EHS Coordinator', 'Basalite® Concrete Products LLC', 'EHS Leader', 'Ball Corporation', 'Site EHS Manager (Gatorade)', 'PepsiCo', 'Manager, EHS', 'Bristol-Myers Squibb', 'Safety Manager', 'Allen Industries Inc', 'Senior Site EHS Manager', 'Amazon.com Services LLC', 'TRAFFICE SAFETY ENGINEER SUPERVISOR', 'State of Arizona', 'Environmental, Safety, & Security Manager', 'Holsum Bakery, Inc (0127)', 'Environmental Health & Safety Manager', 'Mark Anthony Brewing Inc.', 'Safety Manager - F-5 Adversary Program - Yuma, AZ', 'Vertex Aerospace LLC', 'Risk and Patient Safety Coordinator Full Time Day Shift Cent...', 'Abrazo Central Campus', 'TRAFFIC SAFETY ENGINEER SUPERVISOR', 'State of Arizona', 'Food Safety and Environmental Health Program Manager', 'State of Arizona', 'EHS Technical Writer', 'Safety Services Company', 'Environmental Health and Safety (EHS) Coordinator', 'Westerwood Global', 'Safety Manager', 'Pioneer Landscape Centers', 'HSE/HSD Instructor', 'Odle Management Group LLC', 'Safety Manager', 'PAC Worldwide', 'Safety Trainer - Heavy Equipment', 'Pioneer Landscape Centers', 'Safety Coordinator Southwest Steel/ SME Industries Inc.', 'SME Steel Industries', 'Risk and Patient Safety Coordinator Full Time Day Shift Cent...', 'Abrazo Central Campus', 'TRAFFIC SAFETY ENGINEER SUPERVISOR', 'State of Arizona', 'Food Safety and Environmental Health Program Manager', 'State of Arizona', 'EHS Technical Writer', 'Safety Services Company', 'Environmental Health and Safety (EHS) Coordinator', 'Westerwood Global', 'Safety Manager', 'Pioneer Landscape Centers', 'HSE/HSD Instructor', 'Odle Management Group LLC', 'Safety Manager', 'PAC Worldwide', 'Safety Trainer - Heavy Equipment', 'Pioneer Landscape Centers', 'Safety Coordinator Southwest Steel/ SME Industries Inc.', 'SME Steel Industries', 'Risk and Patient Safety Coordinator Full Time Day Shift Cent...', 'Abrazo Central Campus', 'TRAFFIC SAFETY ENGINEER SUPERVISOR', 'State of Arizona', 'Food Safety and Environmental Health Program Manager', 'State of Arizona', 'EHS Technical Writer', 'Safety Services Company', 'Environmental Health and Safety (EHS) Coordinator', 'Westerwood Global', 'Safety Manager', 'Pioneer Landscape Centers', 'HSE/HSD Instructor', 'Odle Management Group LLC', 'Safety Manager', 'PAC Worldwide', 'Safety Trainer - Heavy Equipment', 'Pioneer Landscape Centers', 'Safety Coordinator Southwest Steel/ SME Industries Inc.', 'SME Steel Industries']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM