简体   繁体   中英

Drawing line plot for a histogram

I'm trying to reproduce this chart using Altair as much as I can. https://fivethirtyeight.com/wp-content/uploads/2014/04/hickey-bechdel-11.png?w=575

I'm stuck at getting the black line dividing pass/fail. This is similar to this Altair example: https://altair-viz.github.io/gallery/step_chart.html . However: in the 538 viz the value for the final date must be extended for the full width of that last element. In the step chart example and my solution, the line stops as soon as the last date element is met.

I have looked at altair's github and google groups and found nothing similar to this problem.

import altair as alt
import pandas as pd

movies=pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')
domain = ['ok', 'dubious','men', 'notalk', 'nowomen']

base=alt.Chart(movies).encode(
  alt.X("year:N",bin=alt.BinParams(step=5,extent=[1970,2015]),axis=alt.Axis(labelAngle=0, labelLimit=50,labelFontSize=8),title=None),  alt.Y("count()",stack='normalize',title=None,axis=alt.Axis(format='%',values=[0, 0.25,0.50,0.75,1]))

).properties(width=400)
main=base.transform_calculate(cleanrank='datum.clean_test == "ok" ? 1 : datum.clean_test == "dubious" ? 2 : datum.clean_test == "men" ? 3 : datum.clean_test == "notalk" ? 4 : 5'
                ).mark_bar(stroke='white' #add horizontal lines
                ).encode(  
  alt.Color("clean_test:N",scale=alt.Scale(
      domain=domain,
      range=['dodgerblue', 'skyblue', 'pink', 'coral','red']))
    ,order=alt.Order('cleanrank:O', sort='ascending')
)

extra=base.transform_calculate(cleanpass='datum.clean_test == "ok" ? "PASS" : datum.clean_test == "dubious" ? "PASS" : "FAIL"'
                      ).mark_line(interpolate='step-after'
                      ).encode(alt.Color("cleanpass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']))
                      )



alt.layer(main,extra).configure_scale(
    bandPaddingInner=0.01 #smaller vertical lines
).resolve_scale(color='independent')

One - rather hacky - way to make the step chart cover the beginning of the first until the end of the last bin is to control the bin positions manually (using the rank of the ordered bins).

This way we can add two lines: one with 'step-after' and another one with step-before shifted by one bin. From here on, the tick labels would still need to be replaced & centered with the appropriate bin labels, eg the levels from pd.cut ...

在此处输入图像描述

Dataframe preparation

import altair as alt
import pandas as pd

movies=pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')
domain = ['ok', 'dubious','men', 'notalk', 'nowomen']

movies['year_bin'] = pd.cut(movies['year'], range(1970, 2016, 5))
movies['year_rank'] = movies['year_bin'].cat.codes
movies = movies[movies['year_rank']>=0]
df_plot = movies[['year_rank', 'clean_test']].copy()
df_plot['year_rank_end'] = df_plot['year_rank'] + 1
df_plot['clean_pass'] = df_plot['clean_test'].apply(lambda x: 'PASS' if x in ['ok', 'dubious'] else 'FAIL')

Chart declaration

base=alt.Chart(df_plot).encode(
    x=alt.X('year_rank', 
        axis=alt.Axis(labelAngle=0, labelLimit=50,labelFontSize=8),
        title=None
        ),  
  x2='year_rank_end',
  y=alt.Y('count()',title=None, stack='normalize',
        axis=alt.Axis(format='%',values=[0, 0.25,0.50,0.75,1])
        )
).properties(width=400)

main=base.transform_calculate(
    cleanrank='datum.clean_test == "ok" ? 1 : datum.clean_test == "dubious" ? 2 : datum.clean_test == "men" ? 3 : datum.clean_test == "notalk" ? 4 : 5'
    ).mark_bar(
        stroke='white' #add horizontal lines
    ).encode( 
  alt.Color("clean_test:N",scale=alt.Scale(
      domain=domain,
      range=['dodgerblue', 'skyblue', 'pink', 'coral','red']))
    ,order=alt.Order('cleanrank:O', sort='ascending')
)

extra=base.transform_calculate(
    ).mark_line(
        interpolate='step-after'
    ).encode(
        alt.Color("clean_pass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']))
    )

extra2=base.transform_calculate(
    # shift data by one bin, so that step-before matches the unshifted step-after
    year_rank='datum.year_rank +1' 
    ).mark_line(
        interpolate='step-before'
    ).encode(
        alt.Color("clean_pass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']), legend=None)
    )

alt.layer(main, extra, extra2).configure_scale(
    bandPaddingInner=0.01 #smaller vertical lines
).resolve_scale(color='independent')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM