[英]Issue deploying script to AWS Lambda
我遇到的问题是我正在尝试运行一个使用Selenium特别是webdriver的脚本。
driver = webdriver.Firefox(executable_path='numpy-test/geckodriver', options=options, service_log_path ='/dev/null')
我的问题是该功能需要geckodriver才能运行。 可以在我上载到AWS的zip文件中找到Geckodriver,但是我不知道如何获得在AWS上访问它的功能。 在本地,这不是问题,因为它在我的目录中,因此一切正常。
通过无服务器运行功能时,出现以下错误消息:
{“” ErrorMessage“:”消息:'geckodriver'可执行文件需要放在PATH中。\\ n“,” errorType“:” WebDriverException“,” stackTrace“:[[”“ /var/task/handler.py”,66,“ main“,” print(TatamiClearanceScrape())“],[” /var/task/handler.py"、28、"TatamiClearanceScrape"、"driver = webdriver.Firefox(executable_path ='numpy-test / geckodriver',options =选项,service_log_path ='/ dev / null')“],[” /var/task/selenium/webdriver/firefox/webdriver.py“,164,” init “,” self.service.start()“],[ “ /var/task/selenium/webdriver/common/service.py”,83,“开始”,“ os.path.basename(self.path),self.start_error_message)”]]}
错误------------------------------------------------- -
调用功能失败
任何帮助,将不胜感激。
编辑:
def TatamiClearanceScrape():
options = Options()
options.add_argument('--headless')
page_link = 'https://www.tatamifightwear.com/collections/clearance'
# this is the url that we've already determined is safe and legal to scrape from.
page_response = requests.get(page_link, timeout=5)
# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")
driver = webdriver.Firefox(executable_path='numpy-test/geckodriver', options=options, service_log_path ='/dev/null')
driver.get('https://www.tatamifightwear.com/collections/clearance')
labtnx = driver.find_element_by_css_selector('a.btn.close')
labtnx.click()
time.sleep(10)
labtn = driver.find_element_by_css_selector('div.padding')
labtn.click()
time.sleep(5)
# wait(driver, 50).until(lambda x: len(driver.find_elements_by_css_selector("div.detailscontainer")) > 30)
html = driver.page_source
page_content = BeautifulSoup(html)
# we use the html parser to parse the url content and store it in a variable.
textContent = []
tags = page_content.findAll("a", class_="product-title")
product_title = page_content.findAll(attrs={'class': "product-title"}) # allocates all product titles from site
old_price = page_content.findAll(attrs={'class': "old-price"})
new_price = page_content.findAll(attrs={'class': "special-price"})
products = []
for i in range(len(product_title) - 2):
# groups all products together in list of dictionaries, with name, old price and new price
object = {"Product Name": product_title[i].get_text(strip=True),
"Old Price:": old_price[i].get_text(strip=True),
"New Price": new_price[i].get_text(), 'date': str(datetime.datetime.now())
}
products.append(object)
return products
您可能想要了解一下此功能的AWS Lambda层。 使用Lambda,您可以使用Lambda来使用库,而无需将它们包括在部署包中以实现功能。 分层可以避免您对代码的每次更改都上载依赖项,而只需创建一个包含所有必需软件包的附加层即可。
在此处阅读有关AWS Lambda层的更多详细信息
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.