简体   繁体   English

具有对数轴的熊猫直方图

[英]pandas histogram with logarithmic axes

I have a pandas DataFrame with time length data in seconds. 我有一个带有时间长度数据的pandas DataFrame,以秒为单位。 The length varies from seconds to months so taking a histogram after taking log is convenient as it covers the range better. 长度从几秒到几个月不等,因此在记录后采用直方图很方便,因为它更好地覆盖了范围。 Here is a sample code 这是一个示例代码

%matplotlib inline
import numpy as np
import pandas as pd

x=np.random.lognormal(mean=10, sigma=1, size=10000)
df=pd.DataFrame(x, range(10000), columns=['timeLength'])

np.log10(df.timeLength).hist()

However, the labels on the x-axis are log scaled. 但是,x轴上的标签是按比例缩放的。 Is there a way to put them as 10^1 and so on. 有没有办法把它们作为10 ^ 1等等。 Or even better, if I could put them as 1 second, 10 seconds, 1 minute, 10 minute, 1 hours, 1 day and so on. 或者甚至更好,如果我可以把它们放在1秒,10秒,1分钟,10分钟,1小时,1天等等。

Non-Uniform Bin Histogram 非均匀Bin直方图

Instead of logging the values, 而不是记录值,

 np.log10(df.timeLength) 

try creating a non-uniform binning when computing the histogram. 尝试在计算直方图时创建非均匀分箱。 This can be accomplished with np.histogram 's bins argument . 这可以通过np.histogrambins参数来完成。

Based on 基于

if I could put them as 1 second, 10 seconds, 1 minute, 10 minute, 1 hours, 1 day and so on. 如果我可以把它们放在1秒,10秒,1分钟,10分钟,1小时,1天等等。

the following bin array could be created 可以创建以下bin数组

# Bin locations (time in seconds)
bins = np.array([0, 1, 10, 60, 60*10, 60*60, 24*60*60])

Example

The original dataset was enlarged to fill more of the bins ( mean=5, sigma=2 instead of mean=10, sigma=1 ), this is for example only. 原始数据集被放大以填充更多的箱( mean=5, sigma=2而不是mean=10, sigma=1 ),这仅是示例。 The non-uniform bins are defined, the histogram computed and the plot is presented. 定义非均匀区间,计算直方图并绘制图。 The bins are for example and may be altered. 箱子例如可以改变。

# Create random data in DataFrame
x = np.random.lognormal(mean=5, sigma=2, size=10000)
df = pd.DataFrame(x, columns=['timeLength'])

print df.describe()
print

# Create non-uniform bins.  Unit in seconds.
bins = np.array([0, 1, 10, 60, 60*10, 60*60, 24*60*60])
print 'hisogram bins:', bins

# Get histogram of random data
y, x = np.histogram(df, bins=bins, normed=True)

# Correct bin placement
x = x[1:]

# Turn into pandas Series
hist = pd.Series(y, x)

# Plot
ax = hist.plot(kind='bar', width=1, alpha=0.5, align='center')
ax.set_title('Non-Uniform Bin Histogram')
ax.set_xlabel('Time Length')
ax.set_xticklabels(['1 s', '10 s', '1 Min', '1 Hr', '1 Day', '>1 Day'], rotation='horizontal')

    timeLength   
count   10000.000000
mean     1014.865417
std      4751.820312
min         0.062893
25%        36.941388
50%       144.081235
75%       556.223797
max    237838.467337

hisogram bins: [    0     1    10    60   600  3600 86400]

非均匀bin直方图

Please advise if this is not the intended result. 如果这不是预期的结果,请告知。

If you want to use custom bins, you may want to combine pd.cut with .groupby().count() and use a bar chart: 如果要使用自定义分档,可能需要将pd.cut.groupby().count()使用并使用bar

x=np.random.lognormal(mean=10, sigma=1, size=10000)
df=pd.DataFrame(x, range(10000), columns=['timeLength'])

df['bin'] = pd.cut(df.timeLength,include_lowest=True, bins=[0, 1, 10, 60, 60**2, 60**2*24, df.timeLength.max()], labels=['1s', '10s', '1min', '1hr', '1d', '>1d'])
df.groupby('bin').count().plot.bar()

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM