Why there is a difference between 32 bit and 64 bit numpy/pandas

Question

I'm using numpy/pandas on a 64-bit fedora box, in production they pushed to a 32-bit Centos box and hit an error with json.dumps . It was throwing repr(0) is not Serializable .

I tried testing on 64-bit Centos and it runs absolutely fine. But on 32-bit (Centos 6.8 to be precise) it throws an error. I was wondering if anyone has hit this issue before.

Below is 64-bit Fedora,

Python 2.6.6 (r266:84292, Jun 30 2016, 09:54:10) 
[GCC 5.3.1 20160406 (Red Hat 5.3.1-6)] on linux4
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd

>>> >>> a = pd.DataFrame([{'a':1}])
>>> 
>>> a
   a
0  1
>>> a.to_dict()
{'a': {0: 1}}
>>> import json
>>> json.dumps(a.to_dict())
'{"a": {"0": 1}}'

Below is 32-bit Centos

import json
import pandas as pd

a = pd.DataFrame( [ {'a': 1} ] )
json.dumps(a.to_dict())

Traceback (most recent call last):
  File "sample.py", line 5, in <module>
    json.dumps(a.to_dict())
  File "/usr/lib/python2.6/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.6/json/encoder.py", line 367, in encode
    chunks = list(self.iterencode(o))
  File "/usr/lib/python2.6/json/encoder.py", line 309, in _iterencode
    for chunk in self._iterencode_dict(o, markers):
  File "/usr/lib/python2.6/json/encoder.py", line 275, in _iterencode_dict
    for chunk in self._iterencode(value, markers):
  File "/usr/lib/python2.6/json/encoder.py", line 309, in _iterencode
    for chunk in self._iterencode_dict(o, markers):
  File "/usr/lib/python2.6/json/encoder.py", line 268, in _iterencode_dict
    raise TypeError("key {0!r} is not a string".format(key))
TypeError: key 0 is not a string

What is the usual work around for this issue? I cannot use custom encoder for json as the library I'm using to push this data expects a dictionary and it internally uses json module to serialize it and push it over the wire.

Update: Python version 2.6.6 on both and pandas is 0.16.1 on both

Answer 1

I believe this happens because the index is a numpy.intNN of different size as the Python int and these are not natively converted from one to another.

Like, on my 64-bit Python 2.7 and Numpy:

>>> isinstance(numpy.int64(5), int)
True
>>> isinstance(numpy.int32(5), int)
False

Then:

>>> json.dumps({numpy.int64(5): '5'})
'{"5": "5"}'
>>> json.dumps({numpy.int32(5): '5'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
TypeError: keys must be a string

You could try to change the index to numpy.int32 , numpy.int64 or int :

>>> df = pd.DataFrame( [ {'a': 1}, {'a': 2} ] )
>>> df.index = df.index.astype(numpy.int32)  # perhaps your index was of these?
>>> json.dumps(df.to_dict())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
TypeError: keys must be a string

So you can try changing the index type to int32 , int64 or just plain Python int :

>>> df.index = df.index.astype(numpy.int64)
>>> json.dumps(df.to_dict())
'{"a": {"0": 1, "1": 2}}'

>>> df.index = df.index.astype(int)
>>> json.dumps(df.to_dict())
'{"a": {"0": 1, "1": 2}}'

Why there is a difference between 32 bit and 64 bit numpy/pandas

Question

1 answers

solution1
3 2016-08-18 21:32:00

Why there is a difference between 32 bit and 64 bit numpy/pandas

Question

1 answers

solution1 3 2016-08-18 21:32:00

solution1
3 2016-08-18 21:32:00