简体   繁体   English

在pandas数据帧中的函数,它复制R中的dplyr group_by(多个变量)函数

[英]Function in pandas dataframe that replicates dplyr group_by(multiple variables) function in R

Consider this case: 考虑这种情况:

Python pandas equvilant to R groupby mutate Python pandas等同于R groupby mutate

In dplyr : dplyr

df = df%>% group_by(a,b) %>%  

means first the dataframe is grouped by column a then by b . 表示首先按列a对数据帧进行分组,然后按b分组。

In my case I am trying to group my data first by group_name column, then by user_name , then by type_of_work . 在我的情况下,我尝试首先按group_name列对数据进行分组,然后按user_name进行分组,然后按type_of_work进行分组。 There are more than three columns (which is why I got confused) but I need data grouped according to these three headers in the same order. 有三列以上(这就是为什么我感到困惑)但我需要按照相同顺序按照这三个标题分组的数据。 I already have an algorithm to work with columns after this stage. 在此阶段之后,我已经有了一个算法来处理列。 I only need an algorithm for creating a dataframe grouped according to these three columns. 我只需要一个算法来创建根据这三列分组的数据帧。

It is important in my case that the sequence is preserved like the dplyr function. 在我的情况下,重要的是像dplyr函数一样保留序列。

Do we have anything similar in pandas data-frame? 我们在pandas数据框架中有类似的东西吗?

Grouped = df.groupby(['a', 'b']) Grouped = df.groupby(['a','b'])

Read more on "split-apply-combine" strategy in the pandas docs to see how pandas deals with these issues compared to R. 阅读更多关于pandas docs中“split-apply-combine”策略的信息,了解pandas如何处理这些问题与R相比。

From your comment it seem you want assign the grouped frames. 从您的评论中,您似乎想要分配分组的帧。 You can either use a groupbyobject through the API, eg grouped.mean(), or you can iterate through the groupby object. 您可以通过API使用groupbyobject,例如grouped.mean(),也可以遍历groupby对象。 You will get name and group in each loop. 您将在每个循环中获得名称和组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM