Quicker way to iterate over unique values in pandas?

Question

I have some pandas code I'm trying to run over a big data-set, and despite using apply it looks like it's essentially iterating and running slowly... advice would be welcome!

I'm trying to group up my data. Each row has a non-unique event ID, and each event ID can contain multiple events. If any one of those events is a specific type, I want every row with that ID to have a specific flag - eg, this type of event happened in this ID. Then I want a to export my data-frame with just the IDs, with that flag showing if the event occured in that ID.

This is the code I'm using:

no_duplicates = df.drop_duplicates(subset=["ID])

def add_to_clean(URN):
    single_df = df[df["URN"] == URN].copy()
    return single_df["Event_type"].sum() > 0

no_duplicates["Event_type"] = no_duplicates["ID"].swifter.apply(add_to_clean)

While I've tried to use apply rather than loop, it still seems to be iterating over the whole code and taking ages. Any ideas as to how to make this more efficient?

Answer 1

If need new column filled by aggregated values use GroupBy.transform instead apply + join , but transform working only with one column Event_type :

no_duplicates["Event_type"] = no_duplicates.groupby("URN").Event_type.transform('sum') > 0

Quicker way to iterate over unique values in pandas?

Question

1 answers

solution1
0 ACCPTED 2020-09-23 08:37:36

Quicker way to iterate over unique values in pandas?

Question

1 answers

solution1 0 ACCPTED 2020-09-23 08:37:36

solution1
0 ACCPTED 2020-09-23 08:37:36