How to set y-scale when making a boxplot with dataframe

Question

I have a column of data with a very large distribution and thus I log2-transform it before plotting and visualizing it. This works fine but I cannot seem to figure out how to set the y-scale to the exponential values of 2 (instead I have just the exponents themselves).

df['num_ratings_log2'] = df['num_ratings'] + 1
df['num_ratings_log2'] = np.log2(df['num_ratings_log2'])
df.boxplot(column = 'num_ratings_log2', figsize=(10,10))

As the scale, I would like to have 1 (2^0), 32 (2^5), 1024 (2^1)... instead of 0, 5, 10...

I want everything else about the plot to stay the same. How can I achieve this?

Answer 1

Instead of taking the log of the data, you can create a normal boxplot and then set a log scale on the y-axis ( ax.set_yscale('log') , or symlog to also represent zero). To get the ticks at powers of 2 (instead of powers of 10), use a LogLocator with base 2. A ScalarFormatter shows the values as regular numbers (instead of as powers such as 2 ¹⁰ ). A NullLocator for the minor ticks suppresses undesired extra ticks.

import matplotlib.pyplot as plt
from matplotlib.ticker import ScalarFormatter, LogLocator, NullLocator
import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'num_ratings': (np.random.pareto(10, 10000) * 800).astype(int)})
ax = df.boxplot(column='num_ratings', figsize=(10, 10))
ax.set_yscale('symlog')  # symlog also allows zero
# ax.yaxis.set_major_formatter(ScalarFormatter())  # show tick labels as regular numbers
ax.yaxis.set_major_formatter(lambda x, p: f'{int(x):,}')
ax.yaxis.set_minor_locator(NullLocator())  # remove minor ticks
plt.show()

Answer 2

Hope you are looking for below,

Code

ax = df.boxplot(column='num_ratings_log2', figsize=(20,10))
ymin = 0
ymax = 20
ax.set_ylim(2**ymin, 2**ymax)

How to set y-scale when making a boxplot with dataframe

Question

2 answers

solution1
1 ACCPTED 2021-05-26 07:00:13

solution2
0 2021-05-26 01:47:20

How to set y-scale when making a boxplot with dataframe

Question

2 answers

solution1 1 ACCPTED 2021-05-26 07:00:13

solution2 0 2021-05-26 01:47:20

solution1
1 ACCPTED 2021-05-26 07:00:13

solution2
0 2021-05-26 01:47:20