如何使用 pandas 保持字符值的浮点精度？

Question

I have a data frame like as shown below我有一个如下所示的数据框

df = pd.DataFrame({'source_code':['A250.00','C791.0','716.90','493.90','143.21','134.52'],
                   'source_description':['test1', 'test1','test2','test3','test4,'test5'],
                   'key_id':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]})

hash_file = pd.DataFrame({'source_id':['A250','C791','716.9','493.9','143.21','134.52'],
                          'source_code':['test1','test2','test3','test4','test5'],
                          'hash_id':[911,512,713,814,616,717]})
id_file =  hash_file.set_index(['source_id','source_code'])['hash_id']

I would like to update the values of the key_id column by comparing the source_code , source_description columns with source_id and source_code columns.我想通过将source_code 、 source_description列与source_id和source_code列进行比较来更新key_id列的值。

So, I tried the below based on this post所以，我根据这篇文章尝试了以下内容

df['key_id'] = df.set_index(['source_code','source_description']).index.map(id_file)

While this works fine in normal scenarios, but for specific scenarios when there is a mismatch like 250 and 250.00 or 791.0 and 791 etc, it doesn't work and produces incorrect output like below虽然这在正常情况下工作正常，但对于特定情况下，当存在250和250.00或791.0和791等不匹配时，它不起作用并产生不正确的 output 如下所示

So, I tried converting them to strings but it doesn't work still所以，我尝试将它们转换为字符串，但它仍然不起作用

I expect my output to be like below我希望我的 output 如下所示

Answer 1

If possible, convert values to floats:如果可能，将值转换为浮点数：

df['source_code'] = df['source_code'].astype(float)
hash_file['source_id'] = hash_file['source_id'].astype(float)

id_file =  hash_file.set_index(['source_id','source_code'])['hash_id']

df['key_id'] = df.set_index(['source_code','source_description']).index.map(id_file)


print (df)
   source_code source_description  key_id
0       250.00              test1     911
1       791.00              test1     512
2       716.90              test2     713
3       493.90              test3     814
4       143.21              test4     616
5       134.52              test5     717

But there should be problem with float precision, one possible trick is multiple values some scalar like 1000 and then convert to integers:但是浮点精度应该有问题，一种可能的技巧是多个值，例如1000的标量，然后转换为整数：

df['source_code'] = df['source_code'].astype(float).mul(100).astype(int)
hash_file['source_id'] = hash_file['source_id'].astype(float).mul(100).astype(int)

id_file =  hash_file.set_index(['source_id','source_code'])['hash_id']

df['key_id'] = df.set_index(['source_code','source_description']).index.map(id_file)


print (df)
   source_code source_description  key_id
0        25000              test1     911
1        79100              test1     512
2        71690              test2     713
3        49390              test3     814
4        14321              test4     616
5        13452              test5     717

EDIT:编辑：

If problem is only last 0 or last .0 use:如果问题只是最后一个0或最后一个.0使用：

df['source_code'] = df['source_code'].str.replace('[\.]*[0]+$','', regex=True)
print (df)
  source_code source_description  key_id
0        A250              test1     NaN
1        C791              test1     NaN
2       716.9              test2     NaN
3       493.9              test3     NaN
4      143.21              test4     NaN
5      134.52              test5     NaN

id_file =  hash_file.set_index(['source_id','source_code'])['hash_id']

df['key_id'] = df.set_index(['source_code','source_description']).index.map(id_file)

print (df)
  source_code source_description  key_id
0        A250              test1     911
1        C791              test1     512
2       716.9              test2     713
3       493.9              test3     814
4      143.21              test4     616
5      134.52              test5     717

Better (I hope) regex for remove last .0 if exist:更好的（我希望）正则表达式删除最后一个.0 （如果存在）：

import re

#https://stackoverflow.com/a/44111202/2901002
rgx = re.compile(r'(?:(\.)|(\.\d*?[1-9]\d*?))0+(?=\b|[^0-9])')
df['source_code'] = df['source_code'].str.replace(rgx, r'\2', regex=True)

如何使用 pandas 保持字符值的浮点精度？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-03-29 04:52:53

如何使用 pandas 保持字符值的浮点精度？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-03-29 04:52:53

解决方案1
1 已采纳 2021-03-29 04:52:53