简体   繁体   English

如何查找数据框的唯一对值(在不同的行和列上)的计数并在 Python 中进行可视化?

[英]How to find count of unique pair values (on different rows and columns) of a dataframe and do its visualization in Python?

So, I have the following sample truncated dataset (sales data):因此,我有以下示例截断数据集(销售数据):

----------------------

Product         Hour

PRODUCT_75       12
PRODUCT_75       11
PRODUCT_75       12
PRODUCT_75       12
PRODUCT_63       10
PRODUCT_63       5
PRODUCT_63       5
PRODUCT_12       1
PRODUCT_120      7
PRODUCT_120      5
PRODUCT_120      5
----------------------

Now, I need two things:现在,我需要两件事:

(a) A way to find the count of unique pairs of data items , and consequently, display which was the highest selling product at a particular hour of the day . (a) 一种查找唯一数据项对的计数并因此显示在一天中的特定时间销售最高的产品的方法 For eg, PRODUCT_75 will have a count of '3' for the hour '12', so, supposing that is the highest selling product at that hour, I've to return that product name.例如, PRODUCT_75在“12”小时的计数为“3”,因此,假设这是该小时销量最高的产品,我必须返回该产品名称。 Similarly, I've to do this for all possible hours (from 0 to 23, which is there in my dataset).同样,我必须在所有可能的时间(从 0 到 23,在我的数据集中)都这样做。 For that I need a tentative dataframe like:为此,我需要一个暂定的数据框,例如:

  --------------------------------
    
    Product         Hour    Count
    
    PRODUCT_75       12       3
    PRODUCT_75       11       1
    PRODUCT_75       12       3
    PRODUCT_75       12       3
    PRODUCT_63       10       2
    PRODUCT_63       10       2
    PRODUCT_63       5        2
    PRODUCT_63       5        2
    PRODUCT_12       1        1
    PRODUCT_120      7        1
    PRODUCT_120      5        3
    PRODUCT_120      5        3
    PRODUCT_120      5        3
    --------------------------------

And as explained above, display the product with the highest count at all particular hours of the day (from 0-23) .并且如上所述,在一天中的所有特定时间(从 0 到 23)显示具有最高计数的产品

(b) Secondly, is there a way to visualize the distribution of these highest-selling products at other hours? (b) 其次,有没有办法可视化这些最畅销产品在其他时间的分布? For example, PRODUCT_123 is the highest selling product at hour '5', so I need to visualize its distribution (how much it sold) in other hours.例如, PRODUCT_123是“5”小时销量最高的产品,所以我需要可视化它在其他时间的分布(销量)。

For the above dataset i need output something like:对于上述数据集,我需要输出如下内容:

Max. Sold Products On A Hourly Basis:

---------------------------
Hour     Product      Count
1        PRODUCT_12   1
5        PRODUCT_120  3
7        PRODUCT_120  1
10       PRODUCT_63   2
11       PRODUCT_75   1
12       PRODUCT_75   3
---------------------------

Now, for part (a), I've already employed the following code:现在,对于 (a) 部分,我已经使用了以下代码:

res = reshaped.groupby(['Product', 'Hour']).size()

where reshaped is the data frame with these columns.其中reshape是具有这些列的数据框。

It does return the count of unique pair values, but I don't know how to proceed after this.它确实返回唯一对值的计数,但我不知道在此之后如何进行。 I'd be grateful if anyone were to guide me.如果有人指导我,我将不胜感激。

The following code provides histograms of highest selling products (there may be more than one):以下代码提供了最畅销产品的直方图(可能不止一个):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame(np.array([['PRODUCT_75',12 ], ['PRODUCT_75',11], ['PRODUCT_75' ,12],['PRODUCT_75',12],['PRODUCT_63',10],['PRODUCT_63',10],['PRODUCT_63',5],['PRODUCT_63',5],['PRODUCT_12',1],['PRODUCT_120',7],['PRODUCT_120',5],['PRODUCT_120',5],['PRODUCT_120',5]]),
                   columns=['Product','Hour'])
df['Hour']= df['Hour'].astype('int')
res = df.groupby(['Product', 'Hour']).size().reset_index()
res.rename(columns={0:'count'},inplace=True)

def histogram(df, product):
    df[df['Product'] == product]['Hour'].hist()
    plt.suptitle(str(product))
    plt.show()

def highest_selling(res,hour):
    highest_selling_product = res[res['Hour']==hour]['Product'][res['count']==res['count'].max()].to_list()
    return highest_selling_product

highest_selling_product = highest_selling(res, 5)

for i in highest_selling_product:
    histogram(df,i)

which results in following plot:这导致以下情节: 在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 pandas 的 2 列 dataframe 中找到唯一组合的计数 - how do I find count of unique combination in 2 columns of dataframe in pandas 如何在 Python Pandas 中的 DataFrame 中查找不同值的重复行中的列? - How to find columns in duplicated rows where are different values in DataFrame in Python Pandas? 如何获取 pandas 中每对唯一列的列值计数? - How to get count of column values for each unique pair of columns in pandas? 将不同行和列中的值求和 - sum values in different rows and columns dataframe python 遍历 Panda dataframe 中的多个列并找到计数唯一值 - Iterate through multiple columns in a Panda dataframe and find count unique values Python Pandas - 查找DataFrame行的所有唯一组合,而不重复列中的值 - Python Pandas - find all unique combinations of rows of a DataFrame without repeating values in the columns 如何统计Python Dataframe中唯一值的实例 - How to count the instances of unique values in Python Dataframe 如何区分数据框列中的唯一值并计算其其他列? - How to distinguish unique value in a dataframe column and count its other columns? 如何 Append 唯一行并取其值并放入 Dataframe - How to Append Unique Rows and Take Its Values and put in a Dataframe 如何根据列中的唯一值最好地遍历 DataFrame 上的行? - How do I best iterate through rows on a DataFrame based on unique values in one of the columns?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM