简体   繁体   English

如何在 Altair 中重现 Unsub 直方图?

[英]How to reproduce the Unsub histogram in Altair?

Unsub's histogram display stacks each journal title on top of each other as an individual box, then ranks by cost per use on the x-axis. Unsub 的直方图显示将每个期刊标题作为一个单独的框堆叠在一起,然后在 x 轴上按每次使用成本进行排名。

https://i.ibb.co/2FvXhFp/unsub.jpg (can't post images due to my new account) https://i.ibb.co/2FvXhFp/unsub.jpg (由于我的新帐户无法发布图片)

I want to reproduce this in Altair, I can't figure out how to "break up" the histogram's bars.我想在 Altair 中重现这一点,我不知道如何“分解”直方图的条形。

https://i.ibb.co/jzX4N2r/altair.jpg https://i.ibb.co/jzX4N2r/altair.jpg

    hist = alt.Chart(df[filt_to_100]).mark_bar().encode(
    alt.X('cpu:Q', bin=alt.Bin(maxbins=100)),
    y='count()'
    ).interactive()
hist

I'm sure it has something to do with the y=count() function, but I can't find a way to make it show individual points.我确定它与 y=count() function 有关,但我找不到让它显示各个点的方法。 I also tried switching it to a mark_circle(), but that doesn't look right either.我还尝试将其切换为 mark_circle(),但这看起来也不正确。

You can replicate it to an extent via the detail parameter.您可以通过detail参数将其复制到一定程度。 However, the white lines are not added for each observations, I'm not sure if it is because they are aligning to play nicely with what is on the axis (I tried setting nice=False but to no avail), maybe someone more knowledgeable can fill in.但是,没有为每个观察结果添加白线,我不确定是否是因为它们与轴上的内容很好地对齐(我尝试设置nice=False但无济于事),也许有人更有知识可以填写。

import altair as alt
from vega_datasets import data

source = data.cars()

alt.Chart(source.reset_index(), height=200).mark_bar().encode(
    alt.X("Horsepower", bin=alt.Bin(maxbins=50)),
    alt.Y('count()', axis=alt.Axis(grid=False)),
    alt.Detail('index')
).configure_view(strokeWidth=0)

在此处输入图像描述

Another approach with similar result would be to reformat your dataframe to have a column that runs from 0 to the max count for each bin and then plot it using mark_rect :具有类似结果的另一种方法是重新格式化您的 dataframe 以具有从 0 到每个 bin 的最大计数的列,然后使用mark_rect plot 它:

import altair as alt
from vega_datasets import data

source = data.cars()

# Add index from zero to max count (bin height) for each group
source2 = source.groupby('Horsepower', as_index=False).apply(lambda x: x.reset_index(drop = True)).reset_index()

alt.Chart(source2, height=200, width=800).mark_rect(size=8, strokeWidth=1, stroke='white').encode(
    alt.X('Horsepower:O'),
    alt.Y('level_1:O', title='Count', scale=alt.Scale(reverse=True)),
)

You could use pd.cut to split horsepower into intervals if you wanted it to be more similar to the histogram x-axis, but you can't use the interval datatype directly in Altair so you would need to assign a number to each bin.如果您希望它更类似于直方图 x 轴,您可以使用pd.cut将马力拆分为多个区间,但您不能直接在 Altair 中使用区间数据类型,因此您需要为每个 bin 分配一个数字。

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM