简体   繁体   English

如何安全地从OS级Python沙箱返回对象?

[英]How to safely return objects from an OS-level Python sandbox?

I need to be able to run untrusted Python scripts. 我需要能够运行不受信任的Python脚本。 After much research, it seems like a Python-only sandbox is not secure, at least with CPython (which I need to use). 经过大量研究,似乎仅Python的沙箱似乎并不安全,至少对于CPython(我需要使用它)而言。

Therefore, we are planning to use OS-level sandboxing as well (SELinux, AppArmor, etc.). 因此,我们计划也使用OS级沙箱(SELinux,AppArmor等)。

My question is: how do we safely communicate with the sandbox? 我的问题是:我们如何与沙盒安全通信? The code in the sandbox will need to return Python types such as int and str, as well as Numpy arrays. 沙盒中的代码将需要返回Python类型,例如int和str以及Numpy数组。 There may be more types in the future. 将来可能会有更多类型。

The obvious way is to use pickle, but it seems possible that some malicious code in the sandbox could get a hold of the output pipe (we were thinking of using 0MQ) and send back something that could result in arbitrary code execution when unpickled outside the sandbox. 一种明显的方法是使用pickle,但沙箱中的某些恶意代码似乎可以控制输出管道(我们正在考虑使用0MQ)并发回一些东西,而这些东西可能会在不经过酸洗的情况下被任意执行。沙箱。

Are there safer serialization alternatives to pickle that don't have the performance overhead of JSON and the like? 是否有更安全的腌制序列化替代方法,它们没有JSON等类似的性能开销?

We're using Python 3.3. 我们正在使用Python 3.3。

It sounds like your only real problem with JSON is the way you're encoding NumPy arrays (and Pandas tables). 听起来JSON唯一真正的问题是编码NumPy数组(和Pandas表)的方式。 JSON is not ideal for your use case—not because it's slow at handling NumPy data, but because it's a text-based format, and you have a lot of data that's easier to encode in a non-text-based format. JSON对于您的用例不是理想的-不是因为它处理NumPy数据的速度很慢,而是因为它是基于文本的格式,并且您有许多数据更容易以非基于文本的格式进行编码。

So, I'll show you a way around all of your problems with JSON below… but I would suggest using a different format. 因此,下面我将向您展示解决所有JSON问题的方法……但是我建议您使用其他格式。

The two major "binary JSON" formats, BJSON and BSON , aim to provide most of the benefits of JSON (simple, safe, dynamic/schemaless, traversable, etc.), while also making it possible to embed binary data directly. 两种主要的“二进制JSON”格式BJSONBSON旨在提供JSON的大多数优点(简单,安全,动态/无模式,可遍历等),同时还可以直接嵌入二进制数据。 (The fact that they're also binary rather than textual formats isn't really important to you in this case.) I believe the same is true of Smile , but I've never used it. (在这种情况下,它们也是二进制格式,而不是文本格式,这对您而言并不重要。)我相信Smile也是如此,但我从未使用过。

This means that, in the same way JSON makes it easy to hook in anything you can reduce to strings, floats, lists, and dicts, BJSON and BSON make it easy to hook in anything you can reduce to strings, floats, lists, dicts, and byte strings . 这意味着,以相同的方式,JSON使您可以轻松地将任何您可以简化为字符串,浮点数,列表和字典的内容进行钩接,BJSON和BSON使您可以将任何您可以简化为可简化为字符串,浮点数,列表和字典的内容的钩接和字节字符串 So, when I show how to encode/decode NumPy to strings, the same thing works for byte strings, but without all the extra steps at the end. 因此,当我演示如何将NumPy编码/解码为字符串时,同样的事情适用于字节字符串,但最后没有所有额外的步骤。

The downsides of BJSON and BSON are that they're not human-readable, and don't have nearly as widespread support. BJSON和BSON的缺点是它们不是人类可读的,并且没有那么广泛的支持。


I have no idea how you're currently encoding your arrays, but from the timings I suspect you're using the tolist method or something similar. 我不知道您当前如何编码数组,但是从时间上我怀疑您正在使用tolist方法或类似方法。 That will definitely be slow, and big. 那肯定会很慢而且很大。 And it will even lose information if you're storing anything other than f8 values anywhere (because the only kind of numbers JSON understands are IEEE doubles). 如果您在任何地方存储f8值以外的任何内容,它甚至会丢失信息(因为JSON知道的唯一数字类型是IEEE double)。 The solution is to encode to a string. 解决的办法是编码为字符串。

NumPy has a text format, which will be faster, and not lossy, but still probably slower and bigger than you want. NumPy具有文本格式,该格式将更快,并且不会造成损失,但仍可能会比您想要的慢并且更大。

It also has a binary format, which is great… but doesn't have enough information to recover your original array. 它也具有二进制格式,这非常好……但是没有足够的信息来恢复原始数组。

So, let's look at what pickle uses, which you can see by calling the __reduce__ method on any object: Basically, it's the type, the shape, the dtype, some flags that tell NumPy how to interpret the raw data, and then the binary-format raw data. 因此,让我们看一下pickle用法,您可以通过在任何对象上调用__reduce__方法来看到__reduce__ :基本上,类型,形状,dtype,一些标志告诉NumPy如何解释原始数据,然后是二进制格式的原始数据。 You can actually encode the __reduce__ data yourself—in fact, it might be worth doing so. 实际上,您可以自己对__reduce__数据进行编码-实际上,这样做可能是值得的。 But let's do something a bit simpler for the sake of exposition, with the understanding that it will only work on ndarray , and won't work on machines with different endianness (or rarer cases like sign-magnitude ints or non-IEEE floats). 但是,为了说明起见,让我们做一些简单的事情,要了解它只能在ndarrayndarray ,而不能在不同字节序的机器上运行(或者在诸如符号量级整数或非IEEE浮点数的罕见情况下)。

def numpy_default(obj):
    if isinstance(obj, np.ndarray):
        return {'_npdata': obj.tostring(), 
                '_npdtype': obj.dtype.name,
                '_npshape': obj.shape}
    else:
        return json.dumps(obj)

def dumps(obj):
    return json.dumps(obj, default=numpy_default)

def numpy_hook(obj):
    try:
        data = obj['_npdata']
    except AttributeError:
        return obj
    return np.fromstring(data, obj['_npdtype']).reshape(obj['_npshape'])

def loads(obj):
    return json.loads(obj, object_hook=numpy_hook)

The only problem is that np.tostring gives you 'bytes' objects, which Python 3's json doesn't know how to deal with. 唯一的问题是np.tostring给您'bytes'对象,而Python 3的json不知道如何处理。

This is where you can stop if you're using something like BJSON or BSON. 如果您正在使用BJSON或BSON之类的东西,则可以在这里停止。 But with JSON, you need strings. 但是使用JSON,您需要字符串。

You can fix that easily, if hackily, by "decoding" the bytes with any encoding that maps every single-byte character, like Latin-1: change obj.tostring() to obj.tostring().decode('latin-1') and data = obj['_npdata'] to data = obj['_npdata'].encode('latin-1') . 您可以轻松地解决此问题,方法很简单,即使用映射每个单字节字符的任何编码对字节进行“解码”,例如Latin-1:将obj.tostring()更改为obj.tostring().decode('latin-1')data = obj['_npdata']data = obj['_npdata'].encode('latin-1') That wastes a bit of space by UTF-8-encoding the fake Latin-1 strings, but that's not too bad. 通过对假的Latin-1字符串进行UTF-8编码,会浪费一些空间,但这不错。

Unfortunately, Python will encode every non-ASCII character with a Unicode escape sequence. 不幸的是,Python会使用Unicode转义序列对每个非ASCII字符进行编码。 You can turn that off by setting ensure_ascii=False on the dump and strict=False on the the load, but it will still encode control characters, mostly to 6-byte sequences. 你可以把它们关掉通过设置ensure_ascii=False的转储和strict=False的负载,但它仍然编码控制字符,大多是6字节序列。 This doubles the size of random data, and it can do much worse—eg, an all-zero array will be 6x larger! 这会使随机数据的大小增加一倍,并且可能会变得更糟—例如,全零数组将大6倍!

There used to be a trick to get around this problem, but in 3.3, it doesn't work. 过去有解决这个问题的技巧,但是在3.3中,它不起作用。 The best thing you can do is to fork or monkey-patch the json package so it lets you pass control characters through when given ensure_ascii=False , which you can do like this: 您可以做的最好的事情是对json包进行fork或ensure_ascii=False -patch修补程序,以便在给定ensure_ascii=False ,您可以通过它传递控制字符,您可以像这样进行操作:

json.encoder.ESCAPE = re.compile(r'"')

This is pretty hacky, but it works. 这很hacky,但是可以用。


Anyway, hopefully that's enough to get you started. 无论如何,希望这足以让您入门。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python - 如何将“操作系统级别句柄转换为打开文件”转换为文件对象? - Python - How do I convert “an OS-level handle to an open file” to a file object? 使用PyPI + OS级依赖项打包Python应用程序 - Packaging a Python app with PyPI + OS-level dependencies 使用Python(Bash?)获取操作系统级别的系统信息(CPU速度) - Using Python (Bash?) to get OS-level system information (CPU Speed) 关闭一个用os.fdopen打开的文件关闭os级别的fd? - does closing a file opened with os.fdopen close the os-level fd? 如果不使用,我是否需要关闭tempfile.mkstemp返回给我的操作系统级别的句柄? - Do I need to close the OS-level handle returned to me by tempfile.mkstemp if not using it? 如何从 Python 中的方法返回对象? - How to return objects from methods in Python? 如何在Python中安全地从列表中删除元素 - how to safely remove elements from a list in Python 如何在Python的OS X上从标准输入返回字符? - How to return a character from standard input on OS X in Python? 如何从Python函数将对象作为JSON对象返回? - How to return objects as JSON objects from Python function? Hadoop沙盒上的Os X流上的python客户端 - python client on Os X streaming on hadoop sandbox
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM