简体   繁体   中英

Quicker way to iterate over unique values in pandas?

I have some pandas code I'm trying to run over a big data-set, and despite using apply it looks like it's essentially iterating and running slowly... advice would be welcome!

I'm trying to group up my data. Each row has a non-unique event ID, and each event ID can contain multiple events. If any one of those events is a specific type, I want every row with that ID to have a specific flag - eg, this type of event happened in this ID. Then I want a to export my data-frame with just the IDs, with that flag showing if the event occured in that ID.

This is the code I'm using:

no_duplicates = df.drop_duplicates(subset=["ID])

def add_to_clean(URN):
    single_df = df[df["URN"] == URN].copy()
    return single_df["Event_type"].sum() > 0

no_duplicates["Event_type"] = no_duplicates["ID"].swifter.apply(add_to_clean)

While I've tried to use apply rather than loop, it still seems to be iterating over the whole code and taking ages. Any ideas as to how to make this more efficient?

If need new column filled by aggregated values use GroupBy.transform instead apply + join , but transform working only with one column Event_type :

no_duplicates["Event_type"] = no_duplicates.groupby("URN").Event_type.transform('sum') > 0 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM