简体   繁体   中英

Using Map to approximate match

mapping = {1.0001: 0.0009,
 1.0005: 0.0015,
 1.000666667: 0.0023,
 1.0008: 0.0032,
 1.001: 0.004,
 2.01: 0.0048,
 2.38001428571: 0.0056}


import pandas as pd

df1 = pd.DataFrame({1.03,2.0,2.4}, columns=['Price'])

say we wanted to add a new col called 'Margin' and wanted to approximate map the values in the dict to the prices in the price column. how could I get this done?

ie df1['Margin'] = df1['Price'].map(mapping) will work on full matches not approximate

You can achieve it this way:

df_res = pd.DataFrame([ [df1.Price.loc[ abs(df1.Price - x).argmin() ], y]\
                       for x, y in mapping.items()], columns=['Margin', 'Price'])

Yielding:

#> df_res
   Margin  Price
0   1.03  0.0009
1   1.03  0.0015
2   1.03  0.0023
3   1.03  0.0032
4   1.03  0.0040
5   2.00  0.0048
6   2.40  0.0056

In short, you check for the element in df1.Price with smallest absolute difference with each key in mapping , then construct a new dataframe to avoid the limitation of repeated index in dict structures.

One can use numpy to make this easy:

import numpy as np
import pandas as pd

# create DataFrame as example
df1 = pd.DataFrame({1.03,2.0,2.4}, columns=['Price'])

# Get the mapping values from a dictionary
mapping = {1.0001: 0.0009,
 1.0005: 0.0015,
 1.000666667: 0.0023,
 1.0008: 0.0032,
 1.001: 0.004,
 2.01: 0.0048,
 2.38001428571: 0.0056}

# Convert mapping to two numpy array. The keys are used to find the closest key. 
keys = np.array(list(mapping.keys()))
values = np.array(list(mapping.values()))
# or in one line: 
keys, values = np.array(list(zip(*mapping.items())))

# one value as example how the function works:
# calculate the absolute difference between all keys and the value
# Get the position of the minimum value (argmin)
# based on argmin it is known which key is closest, use that index to get the margin from the values array (values from the mapping dict)
values[np.argmin(np.abs(keys - 1.03))]

# create lambda function so pandas can apply the function on all rows.
get_margin = lambda x: values[np.argmin(np.abs(keys - x))]
# apply the function
df1['margin'] = df1['Price'].apply(get_margin)

df1
   Price  margin
0   1.03  0.0040
1   2.00  0.0048
2   2.40  0.0056


# If you do not want to declare the keys and values array: (I do not see why not but just to show another solution):
get_margin = lambda x: list(mapping.values())[np.argmin(np.abs(np.array(list(mapping.keys())) - x))]
# apply the function
df1['margin'] = df1['Price'].apply(get_margin)

If you want a linear interpolation, here's a way to do it.

import pandas as pd

d = {1.0001: 0.0009,
 1.0005: 0.0015,
 1.000666667: 0.0023,
 1.0008: 0.0032,
 1.001: 0.004,
 2.01: 0.0048,
 2.38001428571: 0.0056}
    
df1 = pd.DataFrame({1.03,2.0,2.4}, columns=['Price'])
df2 = pd.DataFrame(d.items(), columns = ['Price','Margin'])

print(df1)

df3 = pd.merge(df1,df2, how='outer').set_index('Price').sort_index().interpolate()
print(df3)
print(df3.loc[df1['Price']])

Output:

         Margin
Price          
1.03   0.004267
2.00   0.004533
2.40   0.005600

You could use other interpolation methods (eg 'polynomial') from the interpolate method, but many of these will ignore the "outside" values of price such as 2.40, which is larger than all prices with known margins.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM