简体   繁体   English

如何在长 dataframe 中生成多个密度图或一个叠加图 plot 与 seaborn 的多个密度?

[英]How to generate multiple density plots or one overlay plot with multiple densities with seaborn in a long dataframe?

I've been trying to do an overlay graph or a multi-plot grid of a filtered data from a pandas dataframe, but I've only been able to generate the plots separately.我一直在尝试为来自 pandas dataframe 的过滤数据制作叠加图或多绘图网格,但我只能单独生成绘图。 The code that generates the separated plots is the following:生成分离图的代码如下:

# Get the class counts for all objects
class_counts = get_class_counts(clean_df, 0.4)

# Select the top 5 most common objects
top_5_class_counts = class_counts.head(5)

# Create a new dataframe
df_filtered = df[['image', 'class_name']]

# Merge the class counts dataframe with the dataframe containing the image_file and class_name columns
merged_df = df_filtered.merge(top_5_class_counts, on='class_name')

# Group the data by the class_name column
grouped_df = merged_df.groupby('class_name')

# Iterate over the groups and plot the histograms
for name, group in grouped_df:
    # Count the number of times each image appears in the group and store the results
    image_counts = group.groupby(['image', 'class_name']).size().reset_index(name='count')    
    
    # Create a histogram of the count column using seaborn's displot function
    sns.displot(image_counts, x='count', kind='kde', multiple='stack')
    plt.show()

Any help will be appreciate.任何帮助将不胜感激。

A snippet of the merged dataframe:合并后的 dataframe 的片段:

image图片 class_name班级名称 class_id班级号 count数数
berl_000000.png berl_000000.png person 0 0 1462 1462
berl_000002.png berl_000002.png person 0 0 1462 1462
berl_000002.png berl_000002.png person 0 0 1462 1462
berl_000003.png berl_000003.png person 0 0 1462 1462
berl_000003.png berl_000003.png person 0 0 1462 1462
zur_000119.png zur_000119.png truck卡车 7 7 189 189
zur_000116.png zur_000116.png truck卡车 7 7 189 189

Edit: Thanks for editing your question.编辑:感谢您编辑您的问题。 You can accomplish the work in your for loop without looping and create a new table which you can then plot from:您可以在不循环的情况下完成for循环中的工作,并创建一个新表,然后您可以 plot 来自:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

##################
#Create fake data#
##################
num_classes = 8
counts_per_class = 1000
num_images = 100

class_names = np.repeat([f'class_{i+1}' for i in range(num_classes)],counts_per_class)
images = [f'image_{c+1}.png' for c in np.random.randint(0,num_images,num_classes*counts_per_class)]

#this table has a row for each class/image and can have duplicate rows like:
#this is just like your merged_df I think
#   class_name         image
#     class_1       image_10.png
#     class_1       image_11.png
#     class_1       image_11.png
#     class_1       image_12.png
#        ...            ...
df = pd.DataFrame({
    'class_name':class_names,
    'image':images,
})

#Do the work you're doing in your loop all at once w/out a loop
#this table has a unique row per class/image and a new count column
#   class_name         image  count
#      class_1  image_10.png      1
#      class_1  image_11.png      2
#      class_1  image_12.png      1
#        ...            ...
count_df = df.groupby(['class_name','image']).size().reset_index(name='count')

###################################
#Make the overlay displot with hue#
###################################
sns.displot(
    x = 'count',
    hue = 'class_name',
    kind = 'kde',
    data = count_df,
)

plt.show()
plt.close()

在此处输入图像描述

You can achieve the same plot with the for-loop if you want, but the workflow would be (1) create a new empty table, (2) loop through the classes in your loop, appending the per-class table to the new table (3) after looping plot如果需要,您可以使用 for 循环实现相同的 plot,但工作流将是 (1) 创建一个新的空表,(2) 遍历循环中的类,将每个类的表附加到新表(3)循环后plot

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM