简体   繁体   English

在将Python 2代码移植到Python 3时处理ctypes和ASCII字符串

[英]Dealing with ctypes and ASCII strings when porting Python 2 code to Python 3

I got fed up last night and started porting PyVISA to Python 3 (progress here: https://github.com/thevorpalblade/pyvisa ). 昨晚我厌倦了并开始将PyVISA移植到Python 3 (这里的进展: https//github.com/thevorpalblade/pyvisa )。

I've gotten it to the point where everything works, as long as I pass device addresses (well, any string really) as an ASCII string rather than the default unicode string (For example, 只要我将设备地址(嗯,任何字符串确实)作为ASCII字符串而不是默认的unicode字符串传递给我,我就已经达到了一切正常的程度(例如,
HP = vida.instrument(b"GPIB::16") works, whereas HP = vida.instrument("GPIB::16") does not, raising a ValueError. HP = vida.instrument(b“GPIB :: 16”)有效,而HP = vida.instrument(“GPIB :: 16”)则不起作用,引发了ValueError。

Ideally, the end user should not have to care about string encoding. 理想情况下,最终用户不必关心字符串编码。 Any suggestions as to how I should approach this? 关于我应该如何处理的任何建议? Something in the ctypes type definitions perhaps? ctypes类型定义中的某些东西也许?

As it stands, the relevant ctypes type definition is: 就目前而言,相关的ctypes类型定义是:

ViString = _ctypes.c_char_p

ctypes , like most things in Python 3, intentionally doesn't automatically convert between unicode and bytes. 与Python 3中的大多数内容一样, ctypes故意不会在unicode和bytes之间自动转换。 That's because in most use cases, that would just be asking for the same kind of mojibake or UnicodeEncodeError disasters that people switched to Python 3 to avoid. 这是因为在大多数用例中,这只会要求人们切换到Python 3以避免相同类型的mojibake或UnicodeEncodeError灾难。

However, when you know you're only dealing with pure ASCII, that's another story. 但是,当你知道你只处理纯ASCII时,那是另一个故事。 You have to be explicit—but you can factor out that explicitness into a wrapper. 你必须明确 - 但你可以将这种显式性分解为包装器。


As explained in Specifying the required argument types (function prototypes) , in addition to a standard ctypes type, you can pass any class that has a from_param classmethod—which normally returns an instance of some type (usually the same type) with an _as_parameter_ attribute, but can also just return a native ctypes -type value instead. 正如在指定所需的参数类型(函数原型)中所解释的那样,除了标准的ctypes类型之外,您还可以传递任何具有from_param classmethod的类 - 它通常返回具有_as_parameter_属性的某种类型的实例(通常是相同的类型) ,但也可以只返回本机ctypes类型的值。

class Asciifier(object):
    @classmethod
    def from_param(cls, value):
        if isinstance(value, bytes):
            return value
        else:
            return value.encode('ascii')

This may not be the exact rule you want—for example, it'll fail on bytearray (just as c_char_p will) even though that could be converted quietly to bytes … but then you wouldn't want to implicitly convert an int to bytes . 这可能不是你想要的确切规则 - 例如,它会在bytearray失败(就像c_char_p ),即使它可以安静地转换为bytes ......但是你不希望隐式地将int转换为bytes Anything, whatever rule you decide on should be easy to code. 任何事情,无论你决定什么规则都应该很容易编码。


Here's an example (on OS X; you'll obviously have to change how libc is loaded for linux, Windows, etc., but you presumably know how to do that): 这是一个例子(在OS X上;你显然必须改变为linux,Windows等加载libc方式,但你可能知道如何做到这一点):

>>> libc = CDLL('libSystem.dylib')
>>> libc.atoi.argtypes = [Asciifier]
>>> libc.atoi.restype = c_int
>>> libc.atoi(b'123')
123
>>> libc.atoi('123')
123
>>> libc.atoi('123') # Unicode fullwidth digits
ArgumentError: argument 1: <class 'UnicodeEncodeError'>: 'ascii' codec can't encode character '\uff10' in position 0: ordinal not in range(128)
>>> libc.atoi(123)
ArgumentError: argument 1: <class 'AttributeError'>: 'int' object has no attribute 'encode'

Obviously you can catch the exception and raise a different one if those aren't clear enough for your use case. 显然,如果对于您的用例不够清楚,您可以捕获异常并引发不同的异常。

You can similarly write a Utf8ifier , or an Encodifier(encoding, errors=None) class factory, or whatever else you need for some particular library and stick it in the argtypes the same way. 您可以类似地编写Utf8ifierEncodifier(encoding, errors=None)类工厂,或者某些特定库所需的任何其他内容,并以相同的方式将其粘贴到argtypes中。


If you also want to auto-decode return types, see Return types and errcheck . 如果您还想自动解码返回类型,请参阅返回类型errcheck


One last thing: When you're sure the data are supposed to be UTF-8, but you want to deal with the case where they aren't in the same way Python 2.x would (by preserving them as-is), you can even do that in 3.x. 最后一件事:当您确定数据应该是UTF-8时,但是您想要处理它们与Python 2.x不同的情况(通过保留它们原样),你甚至可以在3.x中做到这一点。 Use the aforementioned Utf8ifier as your argtype, and a decoder errcheck, and use errors=surrogateescape . 使用前面提到的Utf8ifier作为你的argtype,并使用解码器errcheck,并使用errors=surrogateescape See here for a complete example. 请参阅此处以获取完整示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM