[英]Pandas pivot heatmap filter most frequent values
Basically, my final result should be a heatmap of the X
most preferred destinations by the X
most common origin countries (like R question How to create heatmap only for 50 highest value here).基本上,我的最终结果应该是
X
个最常见的原产国的X
个最喜欢的目的地的热图(例如 R 问题如何在这里仅为 50 个最高值创建热图)。 Let's say x=2
to align with the small toy dataframe below:假设
x=2
与下面的小玩具 dataframe 对齐:
import pandas as pd
df = pd.DataFrame({'destination_1': ['Germany', 'France', 'UK', 'India', 'China'],
'destination_2': ['China', 'Vietnam', 'Namibia', 'India', 'UK'],
'destination_3' : ['France', 'Italy', 'Namibia', 'China', 'UK'],
'origin' : ['Germany', 'US', 'UK', 'China', 'UK']})
The destination count should be based on the mention across all three destination variables.目的地计数应基于对所有三个目的地变量的提及。 To account for this, I melt and pivot the data.
为了解决这个问题,我融化了 pivot 数据。
df1 = df.melt(id_vars= ['origin'],
value_vars= ['destination_1', 'destination_2', 'destination_3'], var_name='columns')
df_heatmap = df1.pivot_table(index='origin',columns='value',aggfunc='count')
df_heatmap
is basically already a heatmap, no problem visualizing it. df_heatmap
基本上已经是一个热图,可视化它没有问题。 The only problem for me is I don't get where/how I can put a filter to keep only the x
most common origins and destinations.对我来说唯一的问题是我不知道在哪里/如何放置过滤器以仅保留
x
最常见的起点和目的地。
Would surely be better to filter the pivot table to get the true "totals", but here's a way that at least gets the x:x
pivot table dimension.过滤 pivot 表以获得真正的“总数”肯定会更好,但这是一种至少获得
x:x
pivot 表维度的方法。 Basically I use lists of top value counts in both dimensions to filter the dataframe before pivoting it.基本上,我在旋转 dataframe 之前使用两个维度中的最高值计数列表来过滤它。
df1 = df.melt(id_vars= ['origin'],
value_vars= ['destination_1', 'destination_2', 'destination_3'],
var_name='columns')
most = df1['origin'].value_counts()[:2].index.tolist()
most2 = df1['value'].value_counts()[:2].index.tolist()
filt = (df1['origin'].isin(most) & df1['value'].isin(most2))
df2 = df1[filt]
df_heatmap = df2.pivot_table(index='origin',columns='value',aggfunc='count', margins = True, margins_name='Total')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.