简体   繁体   中英

np.int64 is a smaller container than np.int…?

I'm getting surprising behavior trying to convert a microsecond string date to an integer:

n = 20181231235959383171
int_ = np.int(n)  # Works
int64_ = np.int64(n)  # "OverflowError: int too big to convert"

Any idea why?

Edit - Thank you all, this is informative, however please see my actual problem: Dataframe column won't convert from integer string to an actual integer

An np.int can be arbitrarily large, like a python integer.

An np.int64 can only range from -2 63 to 2 63 - 1. Your number happens to fall outside this range.

When used as dtype , np.int is equivalent to np.int_ (architecture-dependent size), which is probably np.int64 . So np.array([n], dtype=np.int) will fail. Outside dtype , np.int behaves as Python int . Numpy is basically helping you calculate as much stuff in C-land as possible in order to speed up the calculations and conserve memory; but (AFAIK) integers larger than 64 bits do not exist in standard C (though the new GCC does support them on some architectures). So you are stuck using either Python integers, slow but of unlimited size, or C integers, fast but not big enough for this.

There are two obvious ways to stuff a large integer into a numpy array:

  • You can use the Python type, signified by dtype=object : np.array([n], dtype=object) will work, but you are getting no speedup or memory benefits from numpy.

  • You can split the microsecond time into second time ( n // 1000000 ) and second fractions ( n % 1000000 ), as two separate columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM