简体   繁体   中英

how to memset a unicode string in python 2.7

I have a unicode string f. I want to memset it to 0. print f should display null (\\0)

I am using ctypes.memset to achieve this -

>     >>> f
>     u'abc'
>     >>> print ("%s" % type(f))
>     <type 'unicode'>
>     >>> import ctypes
>     **>>> ctypes.memset(id(f)+50,0,6)**
>     **4363962530
>     >>> f
>     u'abc'
>     >>> print f
>     abc**

Why did the memory location not get memset in case of unicode string? It works perfectly for an str object.

Thanks for help.

First, this is almost certainly a very bad idea. Python expects strings to be immutable. There's a reason that even the C API won't let you change their contents after they're flagged ready. If you're just doing this to play around with the interpreter's implementation, that can be fun and instructive, but if you're doing it for any real-life purpose, you're probably doing something wrong.

In particular, if you're doing it for "security", what you almost certainly really want to do is to not create a unicode in the first place, but instead create, say, a bytearray with the UTF-16 or UTF-32 encoding of your string, which can be zeroed out in a way that's safe, portable, and a lot easier.


Anyway, there's no reason to expect that two completely different types should store their buffers at the same offset.


In CPython 2.x, a str is a PyStringObject :

typedef struct {
    PyObject_VAR_HEAD
    long ob_shash;
    int ob_sstate;
    char ob_sval[1];
} PyStringObject;

That ob_sval is the buffer; the offset should be 36 on 64-bit builds and (I think) 24 on 32-bit builds.

In a comment, you say:

I read it somewhere and also the offset for a string type is 37 in my system which is what sys.getsizeof('') shows -> >>> sys.getsizeof('') 37

The offset for a string buffer is actually 36, not 37. And the fact that it's even that close is just a coincidence of the way str is implemented. (Hopefully you can understand why by looking at the struct definition—if not, you definitely shouldn't be writing code like this.) There's no reason to expect the same trick to work for some other type without looking at its implementation.


A unicode is a PyUnicodeObject :

typedef struct {
    PyObject_HEAD
    Py_ssize_t length;          /* Length of raw Unicode data in buffer */
    Py_UNICODE *str;            /* Raw Unicode buffer */
    long hash;                  /* Hash value; -1 if not set */
    PyObject *defenc;           /* (Default) Encoded version as Python
                                   string, or NULL; this is used for
                                   implementing the buffer protocol */
} PyUnicodeObject;

Its buffer is not even inside the object itself; that str member is a pointer to the buffer (which is not guaranteed to be right after the struct). Its offset should be 24 on 64-bit builds, and (I think) 20 on 32-bit builds. So, to do the equivalent, you'd need to read the pointer there, then follow it to find the location to memset.

If you're using a narrow-Unicode build, it should look like this:

>>> ctypes.POINTER(ctypes.c_uint16 * len(g)).from_address(id(g)+24).contents[:]
[97, 98, 99]

That's the ctypes translation of finding (uint16_t *)(((char *)g)+24) and reading the array that starts at *that and ends at *(that+len(g)) , which is what you'd have to do if you were writing C code and didn't have access to the unicodeobject.h header.

(In the the test I just quoted, g is at 0x10a598090, while its src points to 0x10a3b09e0 , so the buffer is not immediately after the struct, or anywhere near it; it's about 2MB before it.)

For a wide-Unicode build, the same thing with c_uint32 .

So, that should show you what you want to memset .

And you should also see a serious implication for your attempt at "security" here. (If I have to point it out, that's yet another indication that you should not be writing this code.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM