Fastest pythonic way to loop over dictionary to create new Pandas column

Question

I have a dictionary "c" with 30000 keys and around 600000 unique values (around 20 unique values per key)

I want to create a new pandas series "'DOC_PORTL_ID'" to get a sample value from each row of column "'image_keys'" and then look for its key in my dictionary and return. So I wrote a function like this:

def find_match(row, c):
    for key, val in c.items():
        for item in val:
            if item == row['image_keys']:
                return key

and then I use .apply to create my new column like:

df_image_keys['DOC_PORTL_ID'] = df_image_keys.apply(lambda x: find_match(x, c), axis =1)

This takes a long time. I am wondering if I can improve my snippet code to make it faster.

I googled a lot and was not able to find the best way of doing this. Any help would appreciated.

Answer 1

You're using your dictionary as a reverse lookup. And frankly, you haven't given us enough information about the dictionary. Are the 600,000 values unique? If not, you're only returning the first one you find. Is that expected?

Assume they are unique

reverse_dict = {val: key for key, values in c.items() for val in values}

df_image_keys['DOC_PORTL_ID'] = df_image_keys['image_keys'].map(reverse_dict)

This is as good as you've done yourself. If those values are not unique, you'll have to provide a better explanation of what you expect to happen.

Fastest pythonic way to loop over dictionary to create new Pandas column

Question

1 answers

solution1
4 2019-02-15 14:56:42

Assume they are unique

Fastest pythonic way to loop over dictionary to create new Pandas column

Question

1 answers

solution1 4 2019-02-15 14:56:42

Assume they are unique

solution1
4 2019-02-15 14:56:42