简体   繁体   中英

Convert a tuple to a numpy array corrupts the data

Does anybody can explain what's going on here?

import numpy as np

test1 = ((154L, u'D2'), (155L, u'D2'), (156L, u'D2'))
print np.asarray(test1)

gives

[[u'15' u'D2']
[u'15' u'D2']
[u'15' u'D2']]

but with

test2 =((154L, u'SG2'), (155L, u'SG2'), (156L, u'SG1'))
print np.asarray(test2)

we obtain

[[u'154' u'SG2']
[u'155' u'SG2']
[u'156' u'SG1']]

What happened to the long integer in test1

As far as i understand it, it has to do with unicode support, as you show: in the first case all 6 items are rounded off to 2 characters. They are long integers as you give them in, but if you do numpy.asarray(), they become unicode strings with the same length as the longest unicode string that was in the original array. In the first case, that is 2 characters, and in the second case it is 3. So the long integers change datatype to become unicode strings of the same length as the longest unicode string in the input array. At that moment the last digits disappear (no idea why though, anyone with more unicode experience know if this would be intended, or if this would be a bug?)

Edit: found a solution: specify the dtype as unicode (and get the length correct)

test3 =((154L, u'SG'), (15L, u'SG3'), (1564L, u'SG'))

print(numpy.asarray(test3, dtype='<U4'))

[[u'154' u'SG']
 [u'15' u'SG3']
 [u'1564' u'SG']]

so in this case the 'dtype=...' means unicode of max length 4, and results in the right array

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM