简体   繁体   English

Pandas:如何在由另一列分组的列上获取具有最大值 value_count 的行作为数据框

[英]Pandas: how to get the rows that has the maximum value_count on a column grouping by another column as a dataframe

I have three columns in a pandas dataframe, Date , Hour and Content .我在 Pandas 数据框中有三列DateHourContent I want to get the hour in a day when there is the most content of that day.我想获得一天中内容最多的小时。 I am using messages.groupby(["Date", "Hour"]).Content.count().groupby(level=0).tail(1) .我正在使用messages.groupby(["Date", "Hour"]).Content.count().groupby(level=0).tail(1) I don't know what groupby(level=0) is doing here.我不知道groupby(level=0)在这里做什么。 It outputs as follows-它输出如下 -

Date        Hour
2018-04-12  23       4
2018-04-13  21      43
2018-04-14  9        1
2018-04-15  23      29
2018-04-16  17       1
                    ..
2020-04-23  20       1
2020-04-24  22       1
2020-04-25  20       1
2020-04-26  23      32
2020-04-27  23       3

This is a pandas series object, and my desired Date and Hour columns are MultiIndex here.这是一个熊猫系列对象,我想要的DateHour列是MultiIndex在这里。 If I try to convert the MultiIndex object to dataframe using pd.DataFrame(most_active.index) , most_active being the output of the previous code, it creates a dataframe of tuples as below-如果我尝试使用pd.DataFrame(most_active.index)MultiIndex对象转换为数据帧, most_active是前一个代码的输出,它会创建一个元组数据帧,如下所示 -

                    0
0    (2018-04-12, 23)
1    (2018-04-13, 21)
2     (2018-04-14, 9)
3    (2018-04-15, 23)
4    (2018-04-16, 17)
..                ...
701  (2020-04-23, 20)
702  (2020-04-24, 22)
703  (2020-04-25, 20)
704  (2020-04-26, 23)
705  (2020-04-27, 23)

But I need two separate columns of Date and Hour .但我需要两列独立的DateHour What is the best way for this?最好的方法是什么?

Edit because I misunderstood your question编辑因为我误解了你的问题

First, you have to count the total content by date-hour, just like you did:首先,您必须按日期-小时计算总内容,就像您所做的一样:

df = messages.groupby(["Date", "Hour"], as_index=False).Content.count()

Here, I left the groups in their original columns by passing the parameter as_index=False .在这里,我通过传递参数as_index=False将组保留在原始列中。

Then, you can run the code below, provided in the original answer:然后,您可以运行原始答案中提供的以下代码:

Supposing you have unique index IDs (if not, just do df.reset_index(inplace=True) ), you can use idxmax method in groupby .假设您有唯一的索引 ID(如果没有,只需执行df.reset_index(inplace=True) ),您可以在groupby使用idxmax方法。 It will return the index with the biggest value per group, then you can use them for slicing the dataframe.它将返回每组具有最大值的索引,然后您可以使用它们来切片数据帧。

For example:例如:

df.loc[df.groupby(['Date', 'Hour'])['Content'].idxmax()]

As an alternative (without using groupby), you can first sort the values in descending order, them remove the Date-Hour duplicates:作为替代方案(不使用 groupby),您可以先按降序对值进行排序,然后删除日期-小时重复项:

df.sort_values('Content', ascending=False).drop_duplicates(subset=['Date', 'Hour'])

Finally, you get a MultiIndex with the set_index() method:最后,您可以使用set_index()方法获得MultiIndex

df.set_index(['Date','Hour'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 dataframe 的每一列中找到特定字符串的 value_count - How to find the value_count of a specific string in each column of the dataframe 在 pandas 中使用 value_count 函数时如何命名列? - How to name the column when using value_count function in pandas? 计算pandas数据框中另一列对值分组之前的行数 - count number of rows before a value group by another column in pandas dataframe 如何计算 Pandas dataframe 中同时包含一组列中的值和另一列中的另一个值的行数? - How to count the number of rows containing both a value in a set of columns and another value in another column in a Pandas dataframe? 将pandas dataframe列中的单词按另一列分组以获取频率/计数 - grouping words inside pandas dataframe column by another column to get the frequency/count 确定熊猫数据框中每列的最大值 - determine column maximum value per another column in pandas dataframe 如何将 value_count 输出分配给数据帧 - How to assign a value_count output to a dataframe 如何在列中选择具有最大值的数据框中的行 - How Select The Rows In A Dataframe with the Maximum Value in a Column 计算具有a = x列和b = y列的行数(groupby,value_count) - Counting the number of rows which have column a = x, and column b = y, (groupby, value_count) 如何解释 pandas value_count() output? - How to interpret pandas value_count() output?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM