如何查找数据框的唯一对值（在不同的行和列上）的计数并在 Python 中进行可视化？

Question

So, I have the following sample truncated dataset (sales data):因此，我有以下示例截断数据集（销售数据）：

----------------------

Product         Hour

PRODUCT_75       12
PRODUCT_75       11
PRODUCT_75       12
PRODUCT_75       12
PRODUCT_63       10
PRODUCT_63       5
PRODUCT_63       5
PRODUCT_12       1
PRODUCT_120      7
PRODUCT_120      5
PRODUCT_120      5
----------------------

Now, I need two things:现在，我需要两件事：

(a) A way to find the count of unique pairs of data items , and consequently, display which was the highest selling product at a particular hour of the day . (a) 一种查找唯一数据项对的计数并因此显示在一天中的特定时间销售最高的产品的方法。 For eg, PRODUCT_75 will have a count of '3' for the hour '12', so, supposing that is the highest selling product at that hour, I've to return that product name.例如， PRODUCT_75在“12”小时的计数为“3”，因此，假设这是该小时销量最高的产品，我必须返回该产品名称。 Similarly, I've to do this for all possible hours (from 0 to 23, which is there in my dataset).同样，我必须在所有可能的时间（从 0 到 23，在我的数据集中）都这样做。 For that I need a tentative dataframe like:为此，我需要一个暂定的数据框，例如：

  --------------------------------
    
    Product         Hour    Count
    
    PRODUCT_75       12       3
    PRODUCT_75       11       1
    PRODUCT_75       12       3
    PRODUCT_75       12       3
    PRODUCT_63       10       2
    PRODUCT_63       10       2
    PRODUCT_63       5        2
    PRODUCT_63       5        2
    PRODUCT_12       1        1
    PRODUCT_120      7        1
    PRODUCT_120      5        3
    PRODUCT_120      5        3
    PRODUCT_120      5        3
    --------------------------------

And as explained above, display the product with the highest count at all particular hours of the day (from 0-23) .并且如上所述，在一天中的所有特定时间（从 0 到 23）显示具有最高计数的产品。

(b) Secondly, is there a way to visualize the distribution of these highest-selling products at other hours? (b) 其次，有没有办法可视化这些最畅销产品在其他时间的分布？ For example, PRODUCT_123 is the highest selling product at hour '5', so I need to visualize its distribution (how much it sold) in other hours.例如， PRODUCT_123是“5”小时销量最高的产品，所以我需要可视化它在其他时间的分布（销量）。

For the above dataset i need output something like:对于上述数据集，我需要输出如下内容：

Max. Sold Products On A Hourly Basis:

---------------------------
Hour     Product      Count
1        PRODUCT_12   1
5        PRODUCT_120  3
7        PRODUCT_120  1
10       PRODUCT_63   2
11       PRODUCT_75   1
12       PRODUCT_75   3
---------------------------

Now, for part (a), I've already employed the following code:现在，对于 (a) 部分，我已经使用了以下代码：

res = reshaped.groupby(['Product', 'Hour']).size()

where reshaped is the data frame with these columns.其中reshape是具有这些列的数据框。

It does return the count of unique pair values, but I don't know how to proceed after this.它确实返回唯一对值的计数，但我不知道在此之后如何进行。 I'd be grateful if anyone were to guide me.如果有人指导我，我将不胜感激。

Answer 1

The following code provides histograms of highest selling products (there may be more than one):以下代码提供了最畅销产品的直方图（可能不止一个）：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame(np.array([['PRODUCT_75',12 ], ['PRODUCT_75',11], ['PRODUCT_75' ,12],['PRODUCT_75',12],['PRODUCT_63',10],['PRODUCT_63',10],['PRODUCT_63',5],['PRODUCT_63',5],['PRODUCT_12',1],['PRODUCT_120',7],['PRODUCT_120',5],['PRODUCT_120',5],['PRODUCT_120',5]]),
                   columns=['Product','Hour'])
df['Hour']= df['Hour'].astype('int')
res = df.groupby(['Product', 'Hour']).size().reset_index()
res.rename(columns={0:'count'},inplace=True)

def histogram(df, product):
    df[df['Product'] == product]['Hour'].hist()
    plt.suptitle(str(product))
    plt.show()

def highest_selling(res,hour):
    highest_selling_product = res[res['Hour']==hour]['Product'][res['count']==res['count'].max()].to_list()
    return highest_selling_product

highest_selling_product = highest_selling(res, 5)

for i in highest_selling_product:
    histogram(df,i)

which results in following plot:这导致以下情节：

如何查找数据框的唯一对值（在不同的行和列上）的计数并在 Python 中进行可视化？

问题描述

1 个解决方案

解决方案1
0 2022-06-29 08:52:38

如何查找数据框的唯一对值（在不同的行和列上）的计数并在 Python 中进行可视化？

问题描述

1 个解决方案

解决方案1 0 2022-06-29 08:52:38

解决方案1
0 2022-06-29 08:52:38