Cython：将unicode字符串转换为wchar数组

Question

I am working on using Cython to interface with an external C API that accepts unicode strings in the UCS2 format (array of wchar). 我正在使用Cython与外部C API接口，该API接受UCS2格式的unicode字符串（wchar数组）。 (I understand the limitations of UCS2 vis-a-vis UTF-16, but it's a third-party API.) （我理解UCS2相对于UTF-16的局限性，但它是第三方API。）

Cython Version : 0.15.1 Cython版本 ：0.15.1
Python Version : 2.6 (Narrow unicode build) Python版本 ：2.6（狭窄的unicode构建）
OS : FreeBSD 操作系统 ：FreeBSD

The Cython user guide deals extensively with converting unicode to byte strings, but I couldn't figure out how to convert to a 16-bit array. Cython用户指南广泛涉及将unicode转换为字节字符串，但我无法弄清楚如何转换为16位数组。 I realize I first need to encode to UTF-16 (and I assume for now that code-points beyond the BMP don't occur). 我意识到我首先需要编码为UTF-16（我现在假设不会出现超出BMP的代码点）。 What do I do next? 接下来我该怎么办？ Please help. 请帮忙。

Thanks in advance. 提前致谢。

Answer 1

This is very possible on Python 3 , and a solution is such: 这在Python 3上是非常可能的，并且解决方案是这样的：

# cython: language_level=3

from libc.stddef cimport wchar_t

cdef extern from "Python.h":
    wchar_t* PyUnicode_AsWideCharString(object, Py_ssize_t *)

cdef extern from "wchar.h":
    int wprintf(const wchar_t *, ...)

my_string = u"Foobar\n"
cdef Py_ssize_t length
cdef wchar_t *my_wchars = PyUnicode_AsWideCharString(my_string, &length)

wprintf(my_wchars)
print("Length:", <long>length)
print("Null End:", my_wchars[7] == 0)

A less good Python 2 method follows, but it might be dealing in undefined or broken behaviours, so I'd not trust it too easily: 接下来是一个不太好的Python 2方法，但它可能处理未定义或损坏的行为，所以我不太容易相信它：

# cython: language_level=2

from cpython.ref cimport PyObject
from libc.stddef cimport wchar_t
from libc.stdio  cimport fflush, stdout
from libc.stdlib cimport malloc, free

cdef extern from "Python.h":
    ctypedef PyObject PyUnicodeObject
    Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject *o, wchar_t *w, Py_ssize_t size)

my_string = u"Foobar\n"
cdef Py_ssize_t length = len(my_string.encode("UTF-16")) // 2 # cheating
cdef wchar_t *my_wchars = <wchar_t *>malloc(length * sizeof(wchar_t))
cdef Py_ssize_t number_written = PyUnicode_AsWideChar(<PyUnicodeObject *>my_string, my_wchars, length)

# wprintf breaks things for some reason
print [my_wchars[i] for i in range(length)]
print "Length:", <long>length
print "Number Written:", <long>number_written
print "Null End:", my_wchars[7] == 0

free(my_wchars)

Cython：将unicode字符串转换为wchar数组

问题描述

1 个解决方案

解决方案1
2 2014-01-15 21:26:43

Cython：将unicode字符串转换为wchar数组

问题描述

1 个解决方案

解决方案1 2 2014-01-15 21:26:43

解决方案1
2 2014-01-15 21:26:43