[英]Altair/Vega-Lite bar chart: filter top K bars from aggregated field
I'm visualizing a dataset that has, for instance, a categorical field. 我正在可视化具有例如分类字段的数据集。 I want to create a bar chart that shows the different categories for that field with their cardinality, sorted in 'ascendind'/'descending' order. 我想创建一个条形图,以其基数显示该字段的不同类别,并按“升序” /“降序”顺序排序。 This can simply be achieved with altair
: 这可以简单地通过altair
来实现:
import pandas as pd
import altair as alt
data = {0:{'Name':'Mary', 'Sport':'Tennis'},
1:{'Name':'Cal', 'Sport':'Tennis'},
2:{'Name':'John', 'Sport':'Tennis'},
3:{'Name':'Jane', 'Sport':'Tennis'},
4:{'Name':'Bob', 'Sport':'Golf'},
5:{'Name':'Jerry', 'Sport':'Golf'},
6:{'Name':'Gustavo', 'Sport':'Golf'},
7:{'Name':'Walter', 'Sport':'Swimming'},
8:{'Name':'Jessy', 'Sport':'Swimming'},
9:{'Name':'Patric', 'Sport':'Running'},
10:{'Name':'John', 'Sport':'Shooting'}}
df = pd.DataFrame(data).T
bars = alt.Chart(df).mark_bar().encode(
x=alt.X('count():Q', axis=alt.Axis(format='.0d', tickCount=4)),
y=alt.Y('Sport:N',
sort=alt.SortField(op='count', field='Sport:N', order='descending'))
)
bars
Now suppose I'm interested only in the first three most numerous categories. 现在假设我只对前三个类别感兴趣。 It seemed reasonable to use "transform_window" and “transform_filter” to filter the data but I was unable to find a way to do so. 使用“ transform_window”和“ transform_filter”来过滤数据似乎是合理的,但我无法找到一种方法。 I also went to Vega-Lite Top K example trying to adapt it but without success (my "best" attempt is shown below). 我还去了Vega-Lite Top K示例,尝试使其适应但没有成功(我的“最佳”尝试如下所示)。
bars.transform_window(window=[alt.WindowFieldDef(op='count',
field='Sport:N',
**{'as':'cardinality'})],
frame=[None, None])
bars.transform_window(window=[alt.WindowFieldDef(op='rank',
field='cardinality',
**{'as':'rank'})],
frame=[None, None],
sort=[alt.WindowSortField(field='rank',
order='descending')])
bars.transform_filter( ..... what??? .....)
I would probably do this by first using an aggregate transform to compute the number of people in each group, and then proceeding along the lines of the top-K example you linked to. 我可能首先通过使用聚合变换来计算每个组中的人数,然后按照您链接到的前K个示例的思路进行操作。
alt.Chart(df).mark_bar().encode(
x='count:Q',
y=alt.Y('Sport:N',
sort=alt.SortField(field='count', order='descending', op='sum')
),
).transform_aggregate(
count='count()',
groupby=['Sport']
).transform_window(
window=[{'op': 'rank', 'as': 'rank'}],
sort=[{'field': 'count', 'order': 'descending'}]
).transform_filter('datum.rank <= 3')
Note that in Altair version 2.2 (which has not yet been released as I write this) alt.SortField
will be renamed to alt.EncodingSortField
, because of a change in the underlying Vega-Lite schema. 请注意,在Altair 2.2版(撰写本文时尚未发布)中,由于底层Vega-Lite模式的更改, alt.SortField
将重命名为alt.EncodingSortField
。
(side note: the altair API for sorting and window transforming is pretty clunky at the moment, but we are thinking hard about how to improve that) (旁注:用于排序和窗口转换的altair API目前很笨拙,但是我们正在努力思考如何改进它)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.