np.int64 is a smaller container than np.int…?

Question

I'm getting surprising behavior trying to convert a microsecond string date to an integer:

n = 20181231235959383171
int_ = np.int(n)  # Works
int64_ = np.int64(n)  # "OverflowError: int too big to convert"

Any idea why?

Edit - Thank you all, this is informative, however please see my actual problem: Dataframe column won't convert from integer string to an actual integer

Answer 1

An np.int can be arbitrarily large, like a python integer.

An np.int64 can only range from -2 ⁶³ to 2 ⁶³ - 1. Your number happens to fall outside this range.

Answer 2

When used as dtype , np.int is equivalent to np.int_ (architecture-dependent size), which is probably np.int64 . So np.array([n], dtype=np.int) will fail. Outside dtype , np.int behaves as Python int . Numpy is basically helping you calculate as much stuff in C-land as possible in order to speed up the calculations and conserve memory; but (AFAIK) integers larger than 64 bits do not exist in standard C (though the new GCC does support them on some architectures). So you are stuck using either Python integers, slow but of unlimited size, or C integers, fast but not big enough for this.

There are two obvious ways to stuff a large integer into a numpy array:

You can use the Python type, signified by dtype=object : np.array([n], dtype=object) will work, but you are getting no speedup or memory benefits from numpy.
You can split the microsecond time into second time ( n // 1000000 ) and second fractions ( n % 1000000 ), as two separate columns.

np.int64 is a smaller container than np.int…?

Question

2 answers

solution1
2 ACCPTED 2019-09-26 02:39:33

solution2
2 2019-09-26 02:47:18

np.int64 is a smaller container than np.int…?

Question

2 answers

solution1 2 ACCPTED 2019-09-26 02:39:33

solution2 2 2019-09-26 02:47:18

solution1
2 ACCPTED 2019-09-26 02:39:33

solution2
2 2019-09-26 02:47:18