[英]Drawing line plot for a histogram
我正在嘗試盡可能多地使用 Altair 重現此圖表。 https://fivethirtyeight.com/wp-content/uploads/2014/04/hickey-bechdel-11.png?w=575
我堅持讓黑線划分通過/失敗。 這類似於 Altair 示例: https://altair-viz.github.io/gallery/step_chart.html 。 但是:在 538 可視化項中,最終日期的值必須擴展到最后一個元素的整個寬度。 在步驟圖示例和我的解決方案中,該行在遇到最后一個日期元素后立即停止。
我查看了 altair 的 github 和 google 組,沒有發現與此問題類似的問題。
import altair as alt
import pandas as pd
movies=pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')
domain = ['ok', 'dubious','men', 'notalk', 'nowomen']
base=alt.Chart(movies).encode(
alt.X("year:N",bin=alt.BinParams(step=5,extent=[1970,2015]),axis=alt.Axis(labelAngle=0, labelLimit=50,labelFontSize=8),title=None), alt.Y("count()",stack='normalize',title=None,axis=alt.Axis(format='%',values=[0, 0.25,0.50,0.75,1]))
).properties(width=400)
main=base.transform_calculate(cleanrank='datum.clean_test == "ok" ? 1 : datum.clean_test == "dubious" ? 2 : datum.clean_test == "men" ? 3 : datum.clean_test == "notalk" ? 4 : 5'
).mark_bar(stroke='white' #add horizontal lines
).encode(
alt.Color("clean_test:N",scale=alt.Scale(
domain=domain,
range=['dodgerblue', 'skyblue', 'pink', 'coral','red']))
,order=alt.Order('cleanrank:O', sort='ascending')
)
extra=base.transform_calculate(cleanpass='datum.clean_test == "ok" ? "PASS" : datum.clean_test == "dubious" ? "PASS" : "FAIL"'
).mark_line(interpolate='step-after'
).encode(alt.Color("cleanpass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']))
)
alt.layer(main,extra).configure_scale(
bandPaddingInner=0.01 #smaller vertical lines
).resolve_scale(color='independent')
一種 - 相當 hacky - 使步驟圖覆蓋第一個箱子的開頭直到最后一個箱子結束的方法是手動控制箱子位置(使用有序箱子的等級)。
通過這種方式,我們可以添加兩行:一行帶有'step-after'
,另一行帶有step-before
移動一個 bin。 從這里開始,刻度標簽仍然需要替換並以適當的 bin 標簽居中,例如來自pd.cut
的級別 ...
import altair as alt
import pandas as pd
movies=pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/bechdel/movies.csv')
domain = ['ok', 'dubious','men', 'notalk', 'nowomen']
movies['year_bin'] = pd.cut(movies['year'], range(1970, 2016, 5))
movies['year_rank'] = movies['year_bin'].cat.codes
movies = movies[movies['year_rank']>=0]
df_plot = movies[['year_rank', 'clean_test']].copy()
df_plot['year_rank_end'] = df_plot['year_rank'] + 1
df_plot['clean_pass'] = df_plot['clean_test'].apply(lambda x: 'PASS' if x in ['ok', 'dubious'] else 'FAIL')
base=alt.Chart(df_plot).encode(
x=alt.X('year_rank',
axis=alt.Axis(labelAngle=0, labelLimit=50,labelFontSize=8),
title=None
),
x2='year_rank_end',
y=alt.Y('count()',title=None, stack='normalize',
axis=alt.Axis(format='%',values=[0, 0.25,0.50,0.75,1])
)
).properties(width=400)
main=base.transform_calculate(
cleanrank='datum.clean_test == "ok" ? 1 : datum.clean_test == "dubious" ? 2 : datum.clean_test == "men" ? 3 : datum.clean_test == "notalk" ? 4 : 5'
).mark_bar(
stroke='white' #add horizontal lines
).encode(
alt.Color("clean_test:N",scale=alt.Scale(
domain=domain,
range=['dodgerblue', 'skyblue', 'pink', 'coral','red']))
,order=alt.Order('cleanrank:O', sort='ascending')
)
extra=base.transform_calculate(
).mark_line(
interpolate='step-after'
).encode(
alt.Color("clean_pass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']))
)
extra2=base.transform_calculate(
# shift data by one bin, so that step-before matches the unshifted step-after
year_rank='datum.year_rank +1'
).mark_line(
interpolate='step-before'
).encode(
alt.Color("clean_pass:N",scale=alt.Scale(domain=['PASS','FAIL'],range=['black','white']), legend=None)
)
alt.layer(main, extra, extra2).configure_scale(
bandPaddingInner=0.01 #smaller vertical lines
).resolve_scale(color='independent')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.