简体   繁体   中英

Python: returning the average between two values in a dictionary

I have this function:

def find_nearest(array,value):
    idx = (np.abs(array-value)).argmin()
    return array[idx]

def df_to_count_dict(df):

    count_dict = Counter(df.values)
    holder = []
    for i in range(1,max(list(count_dict.keys()))):
            if i in count_dict.keys(): continue
            holder.append(i)   

    for i in holder:
        j = find_nearest(np.array(list(count_dict.keys())),i)
        count_dict.update({i:count_dict[j]})

    return count_dict

What it does is it takes a data series and uses the Counter function from collection to return back a dictionary. It also replaces values which are not in the dictionary with the closest value.

Now, I want to amend this function to return the same object, the count_dict but replace values not in the keys of the dictionary with the average between what it the missing value is between.

This is best explained by an example:

Take

test = pd.Series([1,2,3,3,7,7,7,8])

Without the function above we get:

Counter(test.values)
Out[459]: Counter({1: 1, 2: 1, 3: 2, 7: 3, 8: 1})

Using the function we get

df_to_count_dict(test)
Out[458]: Counter({1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 2, 7: 3, 8: 1})

As you can see it has added keys 4,5,6 with values 2 as 2 is the value of the closest key (the closest key is 3).

What I have it to return is the AVERAGE between the value of lower closest key and the upper closest key, so the upper closest key is 3, which has value 2, and the upper closest key is 7, which has value 3, so I want the final product to look something like:

df_to_count_dict(test)
Out[458]: Counter({1: 1, 2: 1, 3: 2, 4: 2.5, 5: 2.5, 6: 2.5, 7: 3, 8: 1})

I hope someone can help

This look a lot like school work. So you should figure it out your self. But here is a hint. The query you are being asked to develop is finding the mean between the predecessor's count and the successor's count. The predessor is the largest key smaller or equal to the input and the successor is the smallest key larger than the input.

If you need O(log(n))-complexity then you might look at binary search trees bintrees is a good package https://pypi.python.org/pypi/bintrees/2.0.4 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM