简体   繁体   中英

How to run untrusted code serverside?

I'm trying to run untrusted javascript code in linux + node.js with the sandbox module but it's broken, all i need is to let users write javascript programs that printout some text. No other i/o is allowed and just plain javascript is to be used, no other node modules. If it's not really possible to do, what other language do you suggest for this kind of task? The minimal feature set i need is some math, regexes, string manipulation, and basic JSON functions. Scripts will run for let's say 5 seconds tops and then the process would be killed, how can i achieve that?

I've recently created a library for sandboxing the untrusted code, it seems to fit the demands (executes a code in a restricted process in case of Node.js, and in a Worker inside a sandboxed iframe for a web-browser):

https://github.com/asvd/jailed

There is an opportunity to export the given set of methods from the main application into the sandbox thus providing any custom API and set of privilliges (that feature was actually the reason why I decided to make a library from scratch). The mentioned maths, regexp and string -related stuff is provided by the JavaScript itself, anything additional may be explicitly exported from outside (like some function for communicating with the main application).

All libraries I've seen mentioned in such questions ( vm2 , jailed ) are trying to isolate the node process itself. Those kind of "jails" are constantly broken and highly dependent on future upgrades to node 's standard library to not expose another attack vector.

An alternative would be to use the V8::Isolate class directly. It is meant to isolate JavaScript in Google Chrome & node , so you can expect it to be fully maintained, and more secure than you, I or a single library maintainer would ever be able to do. This class is only able to run "pure" JavaScript . It has the full ECMAScript implementation, but no browser API or node API.

This is what is used by Cloudflare for their Worker product.
deno , the new language developed by node 's creator, has an ambition of sandboxing by default using exactly the same thing and exposing parts of the standard library depending on the flags you enable.

In a node environment , you can use isolated-vm . It's an amazing library that creates v8::Isolate d subprocesses with the code you want to run in isolation.

It provides methods to pass values and functions to the isolate and back. This is not as trivial to use than most of the "jailing" libraries, but guarantees you an actual sandboxing of the JavaScript code.
As it's "pure" JavaScript, the only escapes are the ones you provide under the form of injected functions.
Also, it gets automatically updated with each node version, as it uses node 's own v8::Isolate .
One of the main pains is that if you want to inject libraries in your script, you will likely need to use a package bundler like webpack in order to bundle everything in a single script that can be used by the library.

I personally use it to run user-provided code in a crawler to extract information from a webpage using user provided code, and it works wonders.

The basic idea of sandboxes is, you need variables predefined as globals to do stuff, so if you deny a script them by unsetting them, or replacing them with controlled one, it cannot escape. As long you don't forget anything.

First replace deny require() or replace it with something controlled. dont forget about process and "global" aka "root", the difficult thing is not to forget anything, thats why its good to rely on someone else having built a sandbox ;-)

Docker.io Is an awesome new kid on the block, which uses LXC s and CGroups to create sandboxes.

Here is one implementation of an online gist (similar to codepad.org ) using Docker and Go Lang

This just goes to demonstrate that one can safely run untrusted code written in many programming languages inside Docker Containers , including node.js

Know its pretty late to answer the question, guess the below tool might be a value add which is not mentioned in the above answers/comments.

Trying to implement similar use-case. After have gone through the web resources, https://www.npmjs.com/package/vm2 seems to be handling the sandbox environment(nodejs) pretty well.

It's pretty much satisfies the sandboxing features like restricting the access to builtin or external modules, data exchanges between sandbox, etc.

If you can afford the performance hit, you could run the JS in a throwaway virtual machine with the appropriate CPU and memory limits.

Of course, then you are trusting the security of the VM solution. By using it together with an ordinary JS sandbox, you'd have two layers of security.

For an additional layer, put the sandbox on a different physical machine than your main app.

Ask yourself these questions:

  1. Are you one of the smartest persons on the planet?
  2. Do you turn down job offers by Google, Mozilla and Kaspersky Lab routinely because it would bore you?
  3. Does the "untrusted code" come from people working at the same company as you or from criminals and bored computer kids all over the globe?
  4. Are you sure that node.js has no security holes that could leak through your sandbox?
  5. Can you write perfect 100% error free code?
  6. Do you know everything about JavaScript?

As you already know by your experiments with the sandbox module , writing your own sandbox isn't trivial. The main problem with sandboxes is that you must get everything right. One mistake will ruin your security completely which is why browser developers fight a constant battle with crackers all over the globe.

That said, simple sandboxes are pretty easy to do. First, you'll need to write your own JavaScript interpreter because you can't use the one from node.js because of eval() and require() (both would allow crackers to escape your sandbox).

The interpreter must make sure that the interpreted code cannot access anything besides the few global symbols that you provide. This means there can't be an eval() function, for example (or you must make sure that this function is only evaluated in the context of your own JavaScript interpreter).

Drawback of this approach: A lot of work and if you make a mistake in your interpreter, the crackers can leave the sandbox.

Another approach is to clean the code and run it with node.js's eval() . You can clean existing code by running a bunch of regexp's over it like /eval\\s*[(]//g to remove malicious code parts.

Drawback of this approach: It's easy to make a mistake that will leave you vulnerable to an attack. For example, there might be mismatch between what regexp and what node.js think of as "whitespace". Some obscure unicode whitespace might be accepted by the interpreter but not by regexp which would allow an attacker to run eval() .

My suggestion: Write a small demo test case that shows how the sandbox module is broken and have it fixed. It will save you a lot of time and effort and if there is a bug in the sandbox, it won't be your fault (well, not entirely at least).

I am facing a similar problem right now and I'm reading only bad things about the sandbox module.

If you don't need anything specific to the node environment, I thing the best approach will be to use a headless browser such as PhantomJS or Chimera to use as a sandbox environment.

A late answer but maybe an interesting idea.

Static code analysis => AST manipulation => Code generating

  1. Static analysis will parse the AST of the source code. AST provides a common data structure to allow us to traverse and modify the source code.
  2. Via AST manipulations, we can find out all the identifier references to any sensitive variables in the outer scopes. If we need, we can re-declare and initialize them at the beginning of the function body, so as to overwrite them. Thus the references from the inside to the outside are all in control.
  3. Generating codes from AST is easy as well.

For instance, a function is as shown below:

function () {
    a = 1;
    window.b = 1;
    eval('window.c()');
}

Static analysis based on JS code parser enables us to insert variable declaration statements before the original function body:

function () {
    var a, window = {}, eval = function () {}; // variable overwriting
    a = 1;
    window.b = 1;
    eval('window.c()');
}

That's it.

More overwritings should be considered, such as eval() , new Function() and other global objects or APIs. And warnings during parsing should be well organized and reported.

Some related work in order:

  • esprima , ECMAScript parsing infrastructure for multipurpose analysis.
  • estraverse , ECMAScript JS AST traversal functions.
  • escope , ECMAScript scope analyzer.
  • escodegen , ECMAScript code generator.

My practice based on the above is function-sandbox .

We were running into the same problem while working on one of our products. We wanted to allow users to provide their own custom (untrusted) code that we would run at specific key events of the product, eg a task being completed. Pretty much a better alternative to webhooks!

What we've ended up with was building a separate service using a combination of AWS Lambda, Rust & V8::Isolate and some other bits to make it not only secure but also really fast. We've also added our own integrations of fetch() and such, as V8 doesn't support Web or Node-specific APIs. This furthermore allowed us to do some neat stuff, like restricting the endpoints a script could talk to and even pre-authenticate requests by injecting a pre-configured Authorization header for specific requests/domains.

Instead of open-sourcing our work, we opted to offer the service to others as a hosted offering. The service is globally deployed, requires no setup, and is completely stateless by default! You can check it out at https://scriptable.run .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM