简体   繁体   English

客户端的网页抓取

[英]Web Scraping on client-side

This is probably not the best title for this question.这可能不是这个问题的最佳标题。

So i have a nodejs application running on my server which currently uses a python script for web-scraping but i am looking at moving this to the client-side due to individual client seeing different versions (potentially unique) of the same site.因此,我在我的服务器上运行了一个 nodejs 应用程序,该应用程序当前使用 python 脚本进行网络抓取,但由于各个客户端看到同一站点的不同版本(可能是唯一的),因此我正在考虑将其移动到客户端。

I an ideal world i would like to use javascript to get the html response from a page (what i can see in chrome by right-clicking and choosing view source) to then be processed in javascript.我是一个理想的世界,我想使用 javascript 从页面中获取 html 响应(我可以通过右键单击并选择查看源在 chrome 中看到的内容),然后在 javascript 中进行处理。

However from what i have read online this does not seem to be possible.然而,从我在网上阅读的内容来看,这似乎是不可能的。 I am aware of sites that provide the response (such as anyorigin.com) that can be scraped.我知道提供可以抓取的响应的站点(例如 anyorigin.com)。 However, these are not really suitable for me as i need to be able to scrape what the user see's as each user can potentially see something different on the site i want to scrape.但是,这些并不适合我,因为我需要能够抓取用户看到的内容,因为每个用户都可能在我想要抓取的网站上看到不同的内容。 The python script i am currently using would do this but it would require the user to have python installed in order for me to be able to execute it and this cannot be guaranteed.我目前使用的 python 脚本可以做到这一点,但它需要用户安装 python 以便我能够执行它,这不能保证。

Apologies for the block of text.为文本块道歉。

Is there any solution to this problem ?这个问题有什么解决办法吗?

After some research and the suggestions received, i created a chrome extension using the simple guide on the Chrome Developer site and used a CORSrequest to get what i needed.经过一些研究和收到的建议,我使用 Chrome 开发人员网站上的简单指南创建了一个 chrome 扩展,并使用 CORSrequest 来获取我需要的内容。

If anyone finds this question and would like help, i am happy to provide further details/assistance :)如果有人发现这个问题并需要帮助,我很乐意提供进一步的细节/帮助:)

I was recently trying to do something very similar, and unfortunately, as far as I know there's not a way to do this on the client-side.我最近试图做一些非常相似的事情,不幸的是,据我所知,在客户端没有办法做到这一点。 You may be able to do some trickery and "post" the data you need back you the server where you deal with it, but I don't imagine that will be very efficient or straight forward.您可能可以做一些技巧并将您需要的数据“发布”到您处理它的服务器上,但我认为这不会非常有效或直接。

Though if you do find something, please do share.虽然如果你确实找到了一些东西,请分享。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM