简体   繁体   English

硒抓取javascript

[英]selenium scraping javascript

I'm planning on making a website that scrapes a lot of daily updated URLS (JavaScript) from many websites.我计划制作一个从许多网站上抓取大量每日更新的 URL (JavaScript) 的网站。 I did some research and found selenium, already made some code to extract a URL from a website我做了一些研究,发现 selenium,已经做了一些代码来从网站中提取 URL

from selenium import webdriver
chrome_path = r"C:\Users\hessien\Desktop\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("http://example.com")
driver.find_element_by_xpath("""//*[@id="header"]/div/div[2]/div[3]/ul/li/label/a""").click()
element = driver.find_element_by_xpath("""//*[@id="s"]""")
element.send_keys("example")
driver.find_element_by_xpath("""//*[@id="searchform"]/button/span""").click()
driver.find_element_by_xpath("""//*[@id="contenedor"]/div/div[2]/div[1]/div[2]/article/div[2]/div[1]/a""").click()
driver.find_element_by_xpath("""//*[@id="playex"]/div[1]""").click()
elem = driver.find_element_by_xpath("""//*[@id="mediaplayer_media"]/video""").get_attribute("src");
print elem

but after some searches I found out that selenium mainly used as a testing framework not for scraping and crawling!.. my question is can selenium do the work?但经过一些搜索,我发现 selenium 主要用作测试框架,而不是用于抓取和爬行!..我的问题是 selenium 可以完成这项工作吗? if yes, how to execute the python code in an HTML button?如果是,如何在 HTML 按钮中执行 python 代码? I'm also using Django.我也在使用 Django。 if no, could you recommend anything that can do the task?如果没有,你能推荐任何可以完成任务的东西吗?

If you really want to make a scraper i recommend you to use Beautiful soup, which is a Python library for pulling data out of HTML and XML files.如果你真的想制作一个scraper,我推荐你使用 Beautiful Soup,它是一个 Python 库,用于从 HTML 和 XML 文件中提取数据。 you can integrate the python script with django which can be triggered on a click.您可以将 python 脚本与 django 集成,单击即可触发。 following is the link.以下是链接。

https://pypi.python.org/pypi/beautifulsoup4 https://pypi.python.org/pypi/beautifulsoup4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM