简体   繁体   中英

How to set scaling for axes using plotnine, Python

I am using plotnine library to visualise data. I use for loop to create graphs for different groups of my dataset. Range of groups is different (varies from few tens to tens of thousands). Here my problem arises. I would like to have more precise ticks on y-axis, but they are set automatically and for example if there is group with 20000 rows, there are only four ticks on y-axis - 0, 5000, 10000 and 15000. How can I set more ticks, and how can i add tick for 20000 if there is less than 20000 rows in group (fe if there is 19980 rows, last ticks is 15000).

I am enclosing screen of one of my graphs, where one of groups has almost 600 rows, but when you take a look on ticks on y-axis, the last one is 400.

在此处输入图像描述

for i in range(0, len(MDCstart)):
    mdc_22 = d_spolu[d_spolu.drg_22.str.startswith(MDCstart[i])]
    print(f'MDCka je {i}, tzn. {MDCstart[i]}')
    plot = (p9.ggplot(data=mdc_22,
            mapping=p9.aes(x='factor(drg_22)'))
        + p9.geom_bar(position=p9.position_dodge2(preserve='single'), fill='orange')
        + p9.theme_bw()
        + p9.theme(axis_text_x=p9.element_text(angle=90))
        + p9.labs(x='DRG skupina', y='Pocet')
        + p9.theme(figure_size=(20, 5))
        + p9.labels.ggtitle(title=f'ADRG zacinajuce na {MDCstart[i]}')
        # + p9.scale_y_continuous(limits=(0, 15000, 500))
    )
    print(plot)

As you can see, i tried using scale_y_continuous with limits 0 and 15000 with a step of 500, but it sets the limits, and shows just the values between 0 and 500 in my graphs.

My intention is to find out the way to automatically set "good ticks" - so lets say I want to see 8 ticks on each graph on y-axis with suitable values. Is this possible in some not too much complicated way?

All of these are just "cosmetics" to my graphs, but I would like to know how to make them look the way i want:)

Thank you for any answer!

There's not really a way of automatically setting "good ticks" because the authors of plotnine don't know what you might consider to be "good ticks". You kind of have to do the work yourself.

You would do this by first calculating the height of the maximum bar, then defining a function that produces tick marks based on a set of logic around what you may consider to be "good ticks".

Below is an example using the diamonds data set. This may be "good enough" for your purpose. Otherwise you may need to tinker with the logic. The below works as follows:

  1. Calculate the height of the maximum bar, which is given by MaxCount . So, for diamonds of colour= 'E', MaxCount= 2470, and for diamonds of colour= 'I', MaxCount= 1424.
  2. Define a list of the number of tick marks to be considered, TickOptions . So, for example, this will produce plots with either 4, 5 or 6 gridlines. You can obviously change this.
  3. Define a list of candidate bin sizes, TickMultiples . So, for example, this will produce plots with bin sizes in multiples of 1000, 500, 250 etc. The list should be defined from largest to smallest.
  4. Define the minimum gap and maximum gap (as percentages) that you want the last bar to occupy. So for example, the gap between the top of the highest bar and the maximum y-axis is at least 10% of the bin size but no more than 90% of the bin size.
  5. The TickMarks function will return a list with 2 elements. The first element gives the limits for the y-axis, and the second element gives the break points (as a range). So, for diamonds of colour= 'E', Limits= (0,3000) and Breaks= (0, 3001, 750); and for diamonds of colour= 'I', Limits= (0,1500) and Breaks= (0, 1501, 250).

Example code:

import pandas as pd
from plotnine import *
from plotnine.data import diamonds
from math import ceil

path = "D:\\brb\\"

TickOptions = [4, 5, 6]
TickMultiples = [1000, 500, 250, 100, 50, 10, 5]
MinGap = 0.10
MaxGap = 0.90


def TickMarks(MaxCount):
    for t in TickMultiples:
        if MaxCount >= (TickOptions[0] * t):
            for n in TickOptions:
                Bin = ceil(MaxCount / (n * t)) * t
                Gap = (Bin * n) - MaxCount
                if (Gap >= MinGap*Bin) and (Gap <= MaxGap*Bin):
                    return [ (0, n*Bin), range(0, n*Bin+1, Bin) ]


for color in diamonds['color'].unique():
    df = diamonds[diamonds['color']==color]
    MaxCount = df.groupby('clarity')['color'].count().max()
    p = (ggplot(df, aes(x='clarity'))
      + theme_light()
      + geom_bar(position=position_dodge2(preserve='single'), fill='orange')
      + ggtitle("Colour '" + color + "'")
      + scale_y_continuous(expand=(0,0), limits=TickMarks(MaxCount)[0], breaks= TickMarks(MaxCount)[1])
    )
    p.save(filename=path+'Colour= '+color+'.png', height=10, width=15, units='cm', dpi=300)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM