[英]Altair/Vega-Lite tick chart: filter top K strips from aggregated field
I'm visualizing a dataset that has, for instance, a categorical field and a temporal field. 我正在可视化具有例如分类字段和时间字段的数据集。 I want to create a strip chart that shows the temporal distribution of the different categories sorted in 'ascending'/'descending' order depending on their cardinality. 我想创建一个带状图,以显示根据其基数按“升序” /“降序”顺序排序的不同类别的时间分布。 This can simply be achieved with altair
: 这可以简单地通过altair
来实现:
import pandas as pd
import altair as alt
data = {0:{'Name':'Mary', 'Sport':'Tennis', 'competition':'2018/06/01'},
1:{'Name':'Cal', 'Sport':'Tennis','competition':'2018/06/05'},
2:{'Name':'John', 'Sport':'Tennis','competition':'2018/05/28'},
3:{'Name':'Jane', 'Sport':'Tennis','competition':'2018/05/20'},
4:{'Name':'Bob', 'Sport':'Golf','competition':'2018/03/01'},
5:{'Name':'Jerry', 'Sport':'Golf','competition':'2018/03/03'},
6:{'Name':'Gustavo', 'Sport':'Golf','competition':'2018/02/28'},
7:{'Name':'Walter', 'Sport':'Swimming','competition':'2018/01/01'},
8:{'Name':'Jessy', 'Sport':'Swimming','competition':'2018/01/03'},
9:{'Name':'Patric', 'Sport':'Running','competition':'2018/02/01'},
10:{'Name':'John', 'Sport':'Shooting','competition':'2018/04/01'}}
df = pd.DataFrame(data).T
alt.Chart(df).mark_tick().encode(
x='yearmonthdate(competition):T',
y=alt.Y('Sport:N',
sort=alt.SortField(field='count(Sport:N)', order='ascending', op='sum')
),
)
Now suppose I'm interested only in the first three most numerous categories. 现在假设我只对前三个类别感兴趣。 Following the accepted solution for " Altair/Vega-Lite bar chart: filter top K bars from aggregated field ", this time the plot does't show up: 按照公认的“ Altair / Vega-Lite条形图:从聚合字段中过滤出前K条 ”的解决方案,这次该图不显示:
alt.Chart(df).mark_tick().encode(
x='yearmonthdate(competition):T',
y=alt.Y('Sport:N',
sort=alt.SortField(field='count', order='ascending', op='sum')
),
).transform_aggregate(
count='count()',
groupby=['Sport']
).transform_window(
window=[{'op': 'rank', 'as': 'rank'}],
sort=[{'field': 'count', 'order': 'descending'}]
).transform_filter('datum.rank <= 3')
Notice that even the y-labels order isn't as expected. 请注意,甚至y-labels顺序也不符合预期。
Reading (and understanding) the documentation more in depth, I think I can state that what I asked is currently (June 2018) unfeasible with altair
/ Vega-Lite
. 我更深入地阅读(并理解)该文档,我想我可以指出,我目前所要求的(2018年6月)对于altair
/ Vega-Lite
不可行的。 Here it is my explanation... 这是我的解释...
Performing an aggregate transform on the data, is equivalent of adding a GROUP BY
clause on a SQL query so we are no more able to associate to an encoded channel any “original” data field in its “unaggregated” form: when I try to refer to competition
in the x
channel this is therefore undefined
. 对数据执行聚合转换,等同于在SQL查询上添加GROUP BY
子句,因此我们不再能够将其“未聚合”形式的任何“原始”数据字段与编码通道相关联:当我尝试引用时因此,对于x
通道中的competition
,这是undefined
。
I could try to "selfjoin" using the lookup transform but, even in this case, the final result isn't what I was looking for because this is equivalent to a left join
so I get just one value for each aggregated class. 我可以尝试使用查找转换进行“自连接”,但是即使在这种情况下,最终结果也不是我想要的,因为这等效于left join
因此每个聚合类仅获得一个值。
alt.Chart(df).mark_tick().encode(
x=alt.X(field='competition',type='temporal', timeUnit='yearmonthdate'),
y=alt.Y('Sport:N',
sort=alt.SortField(field='count', order='ascending', op='sum')
),
).transform_aggregate(
countX='count()',
groupby=['Sport']
).transform_window(
window=[{'op': 'rank', 'as': 'rank'}],
sort=[{'field': 'countX', 'order': 'descending'}]
).transform_filter('datum.rank <= 3').transform_lookup(
lookup='Sport',
from_=alt.LookupData(data=df, key='Sport',
fields=['competition'])
)
I discovered that what is necessary to achieve the result I want, is currently supported in Vega
but not in Vega-Lite
nor Altair
: it is the JoinAggregate transform that “extends” original data with the result of one or more aggregations. 我发现, Vega
目前支持实现所需结果的必要条件,而Vega-Lite
或Altair
目前不支持这种方法: JoinAggregate转换通过一个或多个聚合结果“扩展”了原始数据。
For the following input data: 对于以下输入数据:
[
{"foo": 1, "bar": 1},
{"foo": 1, "bar": 2},
{"foo": null, "bar": 3}
]
The join aggregate transform: 联接聚合转换:
{
"type": "joinaggregate",
"fields": ["foo", "bar", "bar"],
"ops": ["valid", "sum", "median"],
"as": ["v", "s", "m"]
}
produces the output: 产生输出:
[
{"foo": 1, "bar": 1, "v": 2, "s": 6, "m": 2},
{"foo": 1, "bar": 2, "v": 2, "s": 6, "m": 2},
{"foo": null, "bar": 3, "v": 2, "s": 6, "m": 2}
]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.