简体繁体 English

codepad.org和ideone.com等网站如何沙盒你的程序？

[英]How do sites like codepad.org and ideone.com sandbox your program?

原文 2010-09-12 17:49:00 7 3 language-agnostic/ operating-system/ sandbox/ system-calls

I need to compile and run user-submitted scripts on my site, similar to what codepad and ideone do. 我需要在我的网站上编译和运行用户提交的脚本，类似于键盘和ideone所做的。 How can I sandbox these programs so that malicious users don't take down my server? 我如何沙箱这些程序，以便恶意用户不会取下我的服务器？

Specifically, I want to lock them inside an empty directory and prevent them from reading or writing anywhere outside of that, from consuming too much memory or CPU, or from doing anything else malicious. 具体来说，我想将它们锁定在一个空目录中，防止它们在其外的任何地方读取或写入，避免消耗太多内存或CPU，或者做任何其他恶意操作。

I will need to communicate with these programs via pipes (over stdin/stdout) from outside the sandbox. 我需要通过沙箱外部的管道（通过stdin / stdout）与这些程序进行通信。

3 个解决方案

codepad.org has something based on geordi , which runs everything in a chroot (ie restricted to a subtree of the filesystem) with resource restrictions, and uses the ptrace API to restrict the untrusted program's use of system calls. codepad.org具有基于什么鹰眼，它运行在chroot与资源限制的一切（即仅限于文件系统的一个子树），并使用ptrace的API限制不信任程序的使用系统调用。 See http://codepad.org/about . 见http://codepad.org/about 。

I've previously used Systrace , another utility for restricting system calls. 我之前使用过Systrace ，这是另一个用于限制系统调用的实用程序。

If the policy is set up properly, the untrusted program would be prevented from breaking anything in the sandbox or accessing anything it shouldn't, so there might be no need put programs in separate chroots and create and delete them for each run. 如果策略设置正确，则可以防止不受信任的程序破坏沙箱中的任何内容或访问它不应该访问的任何内容，因此可能不需要将程序放在单独的chroot中并为每次运行创建和删除它们。 Although that would provide another layer of protection, which probably wouldn't hurt. 虽然这会提供另一层保护，这可能不会受到伤害。

Some time ago I was searching for a sandbox solution to use in an automated assignment evaluation system for CS students. 前段时间我正在寻找一个沙盒解决方案，用于CS学生的自动分配评估系统。 Much like everything else, there is a trade-off between the various properties: 与其他一切非常相似，各种属性之间存在权衡：

Isolation and access control granularity 隔离和访问控制粒度
Performance and ease of installation/configuration 性能和易于安装/配置

I eventually decided on a multi-tiered architecture, based on Linux: 我最终决定采用基于Linux的多层架构：

Level 0 - Virtualization: 0级 - 虚拟化：
By using one or more virtual machine snapshots for all assignments within a specific time range, it was possible to gain several advantages: 通过在特定时间范围内为所有分配使用一个或多个虚拟机快照，可以获得以下几个优势：
- Clear separation of sensitive from non-sensitive data. 明确区分敏感数据和非敏感数据。
- At the end of the period (eg once per day or after each session) the VM is shutdown and restarted from the snapshot, thus removing any remnants of malicious or rogue code. 在该期间结束时（例如，每天一次或每次会话之后），VM将从快照关闭并重新启动，从而删除任何恶意或恶意代码的残余。
- A first level of computer resource isolation: each VM has limited disk, CPU and memory resources and the host machine is not directly accessible. 第一级计算机资源隔离：每个VM具有有限的磁盘，CPU和内存资源，并且无法直接访问主机。
- Straight-forward network filtering: By having the VM on an internal interface, the firewall on the host can selectively filter the network connections. 直接网络过滤：通过在内部接口上安装VM，主机上的防火墙可以选择性地过滤网络连接。
  For example, a VM intended for testing students of an introductory programming course could have all incoming and outgoing connections blocked, since students at that level would not have network programming assignments. 例如，用于测试入门编程课程学生的VM可能会阻止所有传入和传出连接，因为该级别的学生不会进行网络编程分配。 At higher levels the corresponding VMs could eg have all outgoing connections blocked and allow incoming connection only from within the faculty. 在较高级别，相应的VM可以例如阻止所有传出连接并且仅允许来自教师内部的传入连接。
It would also make sense to have a separate VM for the Web-based submission system - one that could upload files to the evaluation VMs, but do little else. 为基于Web的提交系统提供一个单独的VM也是有意义的 - 一个可以将文件上传到评估虚拟机的虚拟机，但除此之外几乎没有。
Level 1 - Basic cperating-system contraints: 1级 - 基本的cperating-system约束：
On a Unix OS that would contain the traditional access and resource control mechanisms: 在包含传统访问和资源控制机制的Unix操作系统上：
- Each sandboxed program could be executed as a separate user, perhaps in a separate chroot jail. 每个沙盒程序可以作为单独的用户执行，也许在一个单独的chroot监狱中执行。
- Strict user permissions, possibly with ACLs. 严格的用户权限，可能使用ACL。
- ulimit resource limits on processor time and memory usage. ulimit资源限制处理器时间和内存使用。
- Execution under nice to reduce priority over more critical processes. 在nice下执行以降低对更关键进程的优先级。 On Linux you could also use ionice and cpulimit - I am not sure what equivalents exist on other systems. 在Linux上你也可以使用ionice和cpulimit - 我不确定其他系统上存在哪些等价物。
- Disk quotas. 磁盘配额。
- Per-user connection filtering. 每用户连接筛选。
You would probably want to run the compiler as a slightly more privileged user; 您可能希望将编译器作为稍微特权的用户运行; more memory and CPU time, access to compiler tools and header files etc 更多内存和CPU时间，访问编译器工具和头文件等
Level 2 - Advanced operating-system constraints: 2级 - 高级操作系统限制：
On Linux I consider that to be the use of a Linux Security Module, such as AppArmor or SELinux to limit access to specific files and/or system calls. 在Linux上，我认为使用Linux安全模块（如AppArmor或SELinux）来限制对特定文件和/或系统调用的访问。 Some Linux distributions offer some sandboxing security profiles, but it can still be a long and painful process to get something like this working correctly. 一些Linux发行版提供了一些沙盒安全配置文件，但要让这样的东西正常工作仍然是一个漫长而痛苦的过程。
Level 3 - User-space sandboxing solutions: 3级 - 用户空间沙盒解决方案：
I have successfully used Systrace in a small scale, as mentioned in this older answer of mine . 我已经成功地使用了Systrace ，如我今年的老答案中所提到的那样。 There several other sandboxing solutions for Linux, such as libsandbox . 还有其他几种适用于Linux的沙盒解决方案，例如libsandbox 。 Such solutions may provide more fine-grained control over the system calls that may be used than LSM-based alternatives, but can have a measurable impact on performance. 与基于LSM的备选方案相比，此类解决方案可以对可以使用的系统调用提供更细粒度的控制，但是可以对性能产生可测量的影响。
Level 4 - Preemptive strikes: 等级4 - 先发制人打击：
Since you will be compiling the code yourself, rather than executing existing binaries, you have a few additional tools in your hands: 由于您将自己编译代码，而不是执行现有的二进制文件，因此您手中还有一些其他工具：
- Restrictions based on code metrics; 基于代码指标的限制; eg a simple "Hello World" program should never be larger than 20-30 lines of code. 例如，一个简单的“Hello World”程序永远不应该超过20-30行代码。
- Selective access to system libraries and header files; 对系统库和头文件的选择性访问; if you don't want your users to call connect() you might just restrict access to socket.h . 如果您不希望用户调用connect() ，则可能只限制对socket.h访问。
- Static code analysis; 静态代码分析; disallow assembly code, "weird" string literals (ie shell-code) and the use of restricted system functions. 禁止汇编代码，“怪异”字符串文字（即shell代码）和使用受限制的系统函数。
A competent programmer might be able to get around such measures, but as the cost-to-benefit ratio increases they would be far less likely to persist. 一个称职的程序员可能能够绕过这些措施，但随着成本效益比的增加，他们不太可能坚持下去。
Level 0-5 - Monitoring and logging: 0-5级 - 监测和记录：
You should be monitoring the performance of your system and logging all failed attempts. 您应该监视系统的性能并记录所有失败的尝试。 Not only would you be more likely to interrupt an in-progress attack at a system level, but you might be able to make use of administrative means to protect your system, such as: 您不仅更有可能在系统级别中断正在进行的攻击，而且您可以使用管理方法来保护您的系统，例如：
- calling whatever security officials are in charge of such issues. 要求任何安全官员负责这些问题。
- finding that persistent little hacker of yours and offering them a job. 找到那些持久的小黑客并为他们提供工作。

The degree of protection that you need and the resources that you are willing to expend to set it up are up to you. 您需要的保护程度以及您愿意花费的资源来设置它取决于您。

I am the developer of libsandbox mentioned by @thkala, and I do recommend it for use in your project. 我是@thkala提到的libsandbox的开发者，我建议你在项目中使用它。

Some additional comments on @thkala's answer, 关于@thkala答案的一些补充意见，

it is fair to classify libsandbox as a user-land tool, but libsandbox does integrate standard OS-level security mechanisms (ie chroot, setuid, and resource quota); 将libsandbox归类为用户登陆工具是公平的，但libsandbox确实集成了标准的操作系统级安全机制（即chroot，setuid和资源配额）;
restricting access to C/C++ headers, or static analysis of users' code, does NOT prevent system functions like connect() from being called. 限制对C / C ++头文件的访问或对用户代码的静态分析不会阻止调用connect()类的系统函数。 This is because user code can (1) declare function prototypes by themselves without including system headers, or (2) invoke the underlying, kernel-land system calls without touching wrapper functions in libc ; 这是因为用户代码可以（1）自己声明函数原型而不包括系统头，或者（2）在不触及libc包装函数的情况下调用底层的kernel-land系统调用;
compile-time protection also deserves attention because malicious C/C++ code can exhaust your CPU with infinite template recursion or pre-processing macro expansion; 编译时保护也值得关注，因为恶意C / C ++代码可以通过无限模板递归或预处理宏扩展来耗尽CPU;