I have an excel with following 3 large integers (in fact, they are ids). But in excel it will be stored in scientific mode. And when I use pandas to read the excel, I will lost precision since the integer is too large for int64 to store it.
Example data: (1.xlsx)
76307016609101000000000000000000
86412903902869300000000000000000
35575701294198100000000000000000
A = pd.read_excel("1.xlsx", engine="openpyxl", header=None, dtype=np.float64)
% no matter what the dtype is the result will be wrong
print(int(A.loc[0])) # 76307016609101001211632066494464 wrong
I don't know whether there's a type of int that's longer than int64 and supported by numpy/pandas. Thank you very much!!
If you can get excel to spit out the raw numbers then you can get around this by reading the numbers as Text
import pandas as pd
import io
txt='''\
line,Num,text
1,12345678901234435346789012345678901234567890,"large"
2,66464644666669999999999999999999999999999999999999999999999999999999999999991,"larger"
3,1,"unity"
4,-9999999999999999999999999999999999999999,"larger in a different way"
'''
df=pd.read_csv(io.StringIO(txt), converters={'Num':int})
print(df)
results in
line Num \
0 1 12345678901234435346789012345678901234567890
1 2 6646464466666999999999999999999999999999999999...
2 3 1
3 4 -9999999999999999999999999999999999999999
text
0 large
1 larger
2 unity
3 larger in a different way
and you can still do sums on them
n=df["Num"][1]
print (n,n+1 )
yields
66464644666669999999999999999999999999999999999999999999999999999999999999991
66464644666669999999999999999999999999999999999999999999999999999999999999992
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.