简体   繁体   中英

CSV: How to find a closest match/closest value from the list of lists (list that contains lists)?

I have a code that reads a column in a CSV file with 3 columns: Zone, Offnet calls, and Traffic.

在此处输入图像描述

Sample data:

 Zone  Offnet calls  Traffic
zone0             0        0
zone1           421    30167
zone2           667    23172
zone3         12146   215033
zone4          7163   126514
zone5          5211   130045
zone6          1374    75357
zone7          3702   257846
zone8          1129    77677
zone9          2679    79331

I need "Offnet calls" and "Traffic" to create a list. For example, row 2 will be [421, 30167] and make a search of best match/closest value from the list which contains lists of same parameters. It will more clear by seeing the code:


tp_usp15 = [10, 200]
tp_usp23 = [15, 250]
tp_usp27 = [20, 300]
list_usp = [tp_usp15,tp_usp23, tp_usp27]

tp_bsnspls_s = [1,30]
tp_bsnspls_steel = [13,250]
tp_bsnspls_chrome = [18,350]
list_bsnspls = [tp_bsnspls_s,tp_bsnspls_steel,tp_bsnspls_chrome]

tp_bsnsrshn10 = [10,200]
tp_bsnsrshn15 = [15,300]
tp_bsnsrshn20 = [20,400]
list_bsnsrshn = [tp_bsnsrshn10,tp_bsnsrshn15,tp_bsnsrshn20]

common_list = list_usp + list_bsnspls + list_bsnsrshn

For example, from this list provided in the code, the closest value/best match for row 2 = [421, 30167] is [20, 400] = tp_bsnsrshn20. And I need a code that will make the same operation for all the values in the CSV file. The closest value/best match needs to be recorded to the next column (a new column called "Best match" should be created right next to the "traffic" column). I have a code that works for inputs. 2 user inputs create a list and the search is done from the list of lists.

client_traffic = int(input("Enter the expected monthly traffic: "))
client_offnet = int(input("Enter monthly offnet calls: "))
list_client = [client_payment, client_offnet]

from functools import partial
def distance_squared(x, y):
    return (x[0] - y[0])**2 + (x[1] - y[1])**2
best_match_overall = min(common_list, key=partial(distance_squared, list_client))
name_best_match_overall = [k for k,v in locals().items() if v == best_match_overall][0]

How to apply this code to work for the whole CSV file. By the way, it gives also the name of value. I guess it should not be hard for advanced users to create some loop that will work by the same concept that I have provided in the last code but for the whole file. I am really struggling at this point. Thanks in advance, guys!

Input data:

>>> df
    Zone  Offnet calls  Traffic
0  zone0             0        0
1  zone1           421    30167
2  zone2           667    23172
3  zone3         12146   215033
4  zone4          7163   126514
5  zone5          5211   130045
6  zone6          1374    75357
7  zone7          3702   257846
8  zone8          1129    77677
9  zone9          2679    79331

Build your reference list as dataframe:

ref = {'tp_usp15': [10, 200],
       'tp_usp23': [15, 250],
       'tp_usp27': [20, 300],
       'tp_bsnspls_s': [1, 30],
       'tp_bsnspls_steel': [13, 250],
       'tp_bsnspls_chrome': [18, 350],
       'tp_bsnsrshn10': [10, 200],
       'tp_bsnsrshn15': [15, 300],
       'tp_bsnsrshn20': [20, 400]}

df1 = pd.DataFrame(ref, index=['crit1', 'crit2']).T.rename_axis('Name')
df1['Best Match'] = list(map(list, df1.values))
>>> df1
                   crit1  crit2 Best Match
Name
tp_usp15              10    200  [10, 200]
tp_usp23              15    250  [15, 250]
tp_usp27              20    300  [20, 300]
tp_bsnspls_s           1     30    [1, 30]
tp_bsnspls_steel      13    250  [13, 250]
tp_bsnspls_chrome     18    350  [18, 350]
tp_bsnsrshn10         10    200  [10, 200]
tp_bsnsrshn15         15    300  [15, 300]
tp_bsnsrshn20         20    400  [20, 400]

Creates the cartesian product from df and df1 and compute the squared distance:

cx = pd.merge(df.reset_index(), df1.reset_index(), how='cross')
x0, x1, y0, y1 = cx[['Offnet calls', 'Traffic', 'crit1', 'crit2']].values.T
cx['distance'] =  (x0 - y0)**2 + (x1 - y1)**2

Keep the closest value of each df row:

cols = ['index', 'Zone', 'Offnet calls', 'Traffic', 'Best Match', 'Name']
out = cx.loc[cx.groupby('index')['distance'].idxmin(), cols] \
        .set_index('index').rename_axis(None)

Output result:

>>> out
    Zone  Offnet calls  Traffic Best Match           Name
0  zone0             0        0    [1, 30]   tp_bsnspls_s
1  zone1           421    30167  [20, 400]  tp_bsnsrshn20
2  zone2           667    23172  [20, 400]  tp_bsnsrshn20
3  zone3         12146   215033  [20, 400]  tp_bsnsrshn20
4  zone4          7163   126514  [20, 400]  tp_bsnsrshn20
5  zone5          5211   130045  [20, 400]  tp_bsnsrshn20
6  zone6          1374    75357  [20, 400]  tp_bsnsrshn20
7  zone7          3702   257846  [20, 400]  tp_bsnsrshn20
8  zone8          1129    77677  [20, 400]  tp_bsnsrshn20
9  zone9          2679    79331  [20, 400]  tp_bsnsrshn20

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM