简体   繁体   中英

Fastest pythonic way to loop over dictionary to create new Pandas column

I have a dictionary "c" with 30000 keys and around 600000 unique values (around 20 unique values per key)

I want to create a new pandas series "'DOC_PORTL_ID'" to get a sample value from each row of column "'image_keys'" and then look for its key in my dictionary and return. So I wrote a function like this:

def find_match(row, c):
    for key, val in c.items():
        for item in val:
            if item == row['image_keys']:
                return key

and then I use .apply to create my new column like:

df_image_keys['DOC_PORTL_ID'] = df_image_keys.apply(lambda x: find_match(x, c), axis =1)

This takes a long time. I am wondering if I can improve my snippet code to make it faster.

I googled a lot and was not able to find the best way of doing this. Any help would appreciated.

You're using your dictionary as a reverse lookup. And frankly, you haven't given us enough information about the dictionary. Are the 600,000 values unique? If not, you're only returning the first one you find. Is that expected?


Assume they are unique

reverse_dict = {val: key for key, values in c.items() for val in values}

df_image_keys['DOC_PORTL_ID'] = df_image_keys['image_keys'].map(reverse_dict)

This is as good as you've done yourself. If those values are not unique, you'll have to provide a better explanation of what you expect to happen.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM