迭代 pandas 中唯一值的更快方法？

Question

I have some pandas code I'm trying to run over a big data-set, and despite using apply it looks like it's essentially iterating and running slowly... advice would be welcome!我有一些 pandas 代码，我试图在一个大数据集上运行，尽管使用了应用程序，但它看起来本质上是在迭代和缓慢运行……欢迎提出建议！

I'm trying to group up my data.我正在尝试对我的数据进行分组。 Each row has a non-unique event ID, and each event ID can contain multiple events.每行都有一个非唯一的事件 ID，每个事件 ID 可以包含多个事件。 If any one of those events is a specific type, I want every row with that ID to have a specific flag - eg, this type of event happened in this ID.如果这些事件中的任何一个是特定类型，我希望具有该 ID 的每一行都有一个特定的标志——例如，这种类型的事件发生在这个 ID 中。 Then I want a to export my data-frame with just the IDs, with that flag showing if the event occured in that ID.然后我想要导出仅包含 ID 的数据框，并使用该标志显示事件是否发生在该 ID 中。

This is the code I'm using:这是我正在使用的代码：

no_duplicates = df.drop_duplicates(subset=["ID])

def add_to_clean(URN):
    single_df = df[df["URN"] == URN].copy()
    return single_df["Event_type"].sum() > 0

no_duplicates["Event_type"] = no_duplicates["ID"].swifter.apply(add_to_clean)

While I've tried to use apply rather than loop, it still seems to be iterating over the whole code and taking ages.虽然我尝试使用应用而不是循环，但它似乎仍然在遍历整个代码并花费很长时间。 Any ideas as to how to make this more efficient?关于如何提高效率的任何想法？

Answer 1

If need new column filled by aggregated values use GroupBy.transform instead apply + join , but transform working only with one column Event_type :如果需要由聚合值填充的新列，请使用GroupBy.transform而不是apply + join ，但transform仅适用于一列Event_type ：

no_duplicates["Event_type"] = no_duplicates.groupby("URN").Event_type.transform('sum') > 0

迭代 pandas 中唯一值的更快方法？

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-09-23 08:37:36

迭代 pandas 中唯一值的更快方法？

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-09-23 08:37:36

解决方案1
0 已采纳 2020-09-23 08:37:36