简体   繁体   English

NodeJs镜像网站代理

[英]NodeJs mirror website proxy

How would you write a server that simply mirrored a website when a request was received? 当收到请求时,您将如何编写仅镜像网站的服务器? For example, hitting http://localhost:5000 which is running NodeJS would render cnn.com with images and everything. 例如,点击运行NodeJS的http://localhost:5000会渲染具有图像和所有内容的cnn.com。 Is this called a passthrough proxy? 这称为直通代理吗?

I'm not looking for something that requires configuring an actual proxy within your browser settings, but instead just serves up essentially a mirror of another site by passing the requests through. 我并不是在寻找需要在浏览器设置中配置实际代理的内容,而是通过传递请求来实质上提供另一个站点的镜像。

First, let me make sure I understand your question. 首先,请确保我了解您的问题。

You want to have your users browse to http://mynodeproxy.example.com and have that page in their browser render as if it was http://cnn.com . 您想让用户浏览到http://mynodeproxy.example.com,并在其浏览器中呈现该页面,就像它是http://cnn.com一样 Right? 对?

The answer is: You can't do it the way you think you can. 答案是:您无法以自己认为的方式做到这一点。 This is possible with 2 approaches: 这可以通过2种方法实现:

  1. Users configure a real proxy server in their browser settings (this is why all browsers support configuring a proxy server). 用户在其浏览器设置中配置真实的代理服务器(这就是为什么所有浏览器都支持配置代理服务器的原因)。 You could use an existing proxy server or try to write your own with node and some specialized application logic. 您可以使用现有的代理服务器,也可以尝试使用节点和一些特殊的应用程序逻辑编写自己的代理服务器。 But the point is the user's don't type your proxy address into the browser's address bar. 但是重点是用户不要在浏览器的地址栏中输入您的代理地址。 They type your proxy address into their browser settings "proxy server" field and still type " http://cnn.com " into their browser address bar. 他们在浏览器设置“代理服务器”字段中输入您的代理地址,而在浏览器地址栏中仍然输入“ http://cnn.com ”。

  2. If you control all outgoing traffic from your network, you can do hotel-style tricks like DNS hijacking or routing all traffic through your proxy. 如果您控制来自网络的所有传出流量,则可以执行酒店风格的技巧,例如DNS劫持或通过代理路由所有流量。

But this won't work by having your users put your passthrough proxy server address in their browser's address bar because the HTML your proxy gets from CNN.com is going to have hyperlinks back to other cnn.com resources (other pages on the site, images, fonts, CSS, JS, etc). 但这无法通过让用户将您的直通代理服务器地址放在浏览器的地址栏中来解决,因为您的代理从CNN.com获取的HTML将具有指向其他cnn.com资源的超链接(网站上的其他页面,图片,字体,CSS,JS等)。 If those links include the hostname instead of being relative to the containing HTML document, the browser will connect directly to cnn.com to load them, bypassing your proxy. 如果这些链接包含主机名而不是相对于所包含的HTML文档,则浏览器将直接连接到cnn.com以加载它们,从而绕过您的代理。

Now imagine the CNN HTML has a link like <a href="http://cnn.com">View the CNN Home Page</a> . 现在,假设CNN HTML具有类似于<a href="http://cnn.com">View the CNN Home Page</a> What happens when the user clicks that? 用户单击该怎么办? That's right, your proxy is entirely out of the picture and bypasses. 没错,您的代理完全不在画面中并被绕过。 This is why proxy servers work with explicit browser support. 这就是代理服务器与显式浏览器支持一起工作的原因。

Once CNN.com's javascript starts doing things like making ajax requests, dynamically adding stuff to the DOM, etc, you will see this is not possible by simply proxying and modifying the initial cnn.com home page HTML. 一旦CNN.com的javascript开始执行诸如发出ajax请求,向DOM动态添加内容等操作,您将看到仅通过代理和修改初始cnn.com主页HTML便无法实现。 Yes, you could do this for an extremely trivial contrived example web page, but realistically a modern popular site like cnn.com, it's not feasible. 是的,您可以在一个非常琐碎的示例网页上执行此操作,但是实际上,像cnn.com这样的现代流行网站并不可行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM