I'm building a multi-categorial boxplot and I'm having trouble aligning the boxplots to be in the right position.
I have my x axis of genes which is subdivided by databases. But my plot plots everything in the middle on gene.
I make my axis like this:
x = sorted(list(set([(gene, database) for gene in totaldf['gene'].to_list() for database in totaldf['db'].to_list()])))
p = figure(background_fill_color="#efefef", x_range=FactorRange(*x), width=1600, height=700)
and I try to plot my boxes like this:
p.vbar(x=(gene, db), width=0.7, bottom=q2[gene, db], top=q3[gene, db], fill_color="#ffffff", line_color="black")
p.vbar(x=(gene, db), width=0.7, bottom=q1[gene, db], top=q2[gene, db], fill_color="#ffffff", line_color="black")
This results in my plot plotting like this: https://imgur.com/a/8Q8YC1N
How do I get the plot to be in the right locations? The dataframe looks like this:
gene db mutations
0 IGHV1-3 G1K_CL2 6
1 IGHV1-58 G1K_CL2 2
2 IGHV1-58 G1K_CL2 3
3 IGHV1-8 G1K_CL2 2
4 IGHV3-16 G1K_CL2 3
.. ... ... ...
141 IGHV4-61 G1K_CL3 11
142 IGHV4-61 G1K_CL3 12
143 IGHV4-61 G1K_CL3 10
144 IGHV4-61 G1K_CL3 13
145 IGHV7-81 G1K_CL3 4
gene
and db
are the columns from the DataFrame? The coordinates are not being supplied in the correct format. You are supplying a 2-tuple of lists , but what is required is a list of 2-tuples . The coordinate list should look like:
x=[(gene1, db1), (gene2, db2), ...])
Probably zip(gene, db)
will provide what you intend.
All that said, I also strongly advise to use explicitly created ColumnDataSource
when dealing with nested categorical data. There are some inherent ambiguities that can arise and constructing a CDS yourself eliminates those.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.