简体   繁体   中英

In Altair/Vega-lite how to show percentage of grouped category instead of total?

Newbie using Altair/Vega-lite and struggling a bit to "get" the transformations and calculations and encoding way of thinking, especially for more complex/nested data.

Specifically, I am trying to create a super simple layered histogram, that shows the salary distribution of different countries.

So far I was able to get on the Y Axis the percentage of occurrences compared to the total:

salaries = {
    'NL': np.random.normal(loc=80000, scale=30000, size=(500,)),
    'ES': np.random.normal(loc=80000, scale=30000, size=(50,))
source = pd.DataFrame({k:pd.Series(v) for k,v in salaries.items()})

c = alt.Chart(source).transform_fold(
   ['NL', 'ES'],
   as_=['Benchmark', 'Salaries']
       pct='1/ datum.total'
   ).mark_bar(opacity=0.3, binSpacing=0
       x=alt.X('Salaries:Q', bin=alt.Bin(maxbins=20)),
       y=alt.Y('sum(pct):Q', axis=alt.Axis(format='%'), stack=None)

which results in:

total percentage

However, I'd like the percentage to be applicable to each category instead of the total. So, in this example, on the Y axis the second distribution should show percentages in the same level as the first one, as they are identical normal distributions.

I hope it's clear enough, apologies for probably lacking the statistical theory and glossary to explain things better.

It is grouped per category but the problem here is that your 'ES' column has 450 nan values, which are still counted in the count() I guess, so your % for the actual values is very low. One way to solve this is to use alt.Chart(source.dropna()) . which would yield the plot below. 阴谋

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM