简体   繁体   English

想要以log10比例x轴将Pandas Dataframe绘制为多个直方图

[英]Want to plot Pandas Dataframe as Multiple Histograms with log10 scale x-axis

I have floating point data in a Pandas dataframe. 我在Pandas数据框中有浮点数据。 Each column represents a variable (they have string names) and each row a set of values (the rows have integer names which are not important). 每列代表一个变量(它们具有字符串名称),每行代表一组值(各行具有不重要的整数名称)。

>>> print data
0      kppawr23    kppaspyd
1      3.312387   13.266040
2      2.775202    0.100000
3    100.000000  100.000000
4    100.000000   39.437420
5     17.017150   33.019040
...

I want to plot a histogram for each column. 我想为每列绘制一个直方图。 The best result I have achieved is with the hist method of dataframe: 我获得的最佳结果是使用dataframe的hist方法:

data.hist(bins=20)

but I want the x-axis of each histogram to be on a log10 scale. 但我希望每个直方图的x轴都为log10刻度。 And the bins to be on log10 scale too, but that is easy enough with bins=np.logspace(-2,2,20). 并且bins也要在log10上缩放,但是使用bins = np.logspace(-2,2,20)足够容易。

A workaround might be to log10 transform the data before plotting, but the approaches I have tried, 一种解决方法可能是在绘制之前log10转换数据,但是我尝试过的方法是

data.apply(math.log10)

and

data.apply(lambda x: math.log10(x))

give me a floating point error. 给我一个浮点数错误。

    "cannot convert the series to {0}".format(str(converter)))
TypeError: ("cannot convert the series to <type 'float'>", u'occurred at index kppawr23')

You could use 你可以用

ax.set_xscale('log')

data.hist() returns an array of axes. data.hist()返回轴数组。 You'll need to call ax.set_xscale('log') for each axes, ax to make each of the logarithmically scaled. 您需要为每个轴调用ax.set_xscale('log') ,然后使用ax进行对数缩放。


For example, 例如,

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(2015)

N = 100
arr = np.random.random((N,2)) * np.logspace(-2,2,N)[:, np.newaxis]
data = pd.DataFrame(arr, columns=['kppawr23', 'kppaspyd'])

bins = np.logspace(-2,2,20)
axs = data.hist(bins=bins)
for ax in axs.ravel():
    ax.set_xscale('log')

plt.gcf().tight_layout()
plt.show()

yields 产量

在此处输入图片说明


By the way, to take the log of every value in the DataFrame, data , you could use 顺便说一句,要获取DataFrame中每个值的日志, data ,您可以使用

logdata = np.log10(data)

because NumPy ufuncs (such as np.log10 ) can be applied to pandas DataFrames because they operate elementwise on all the values in the DataFrame . 因为NumPy ufuncs(例如np.log10 )可以应用于熊猫DataFrame,因为它们对DataFrame中的所有值进行逐元素运算

data.apply(math.log10) did not work because apply tries to pass an entire column (a Series) of values to math.log10 . data.apply(math.log10)不起作用,因为apply尝试将值的整个列(一系列)传递给math.log10 math.log10 expects a scalar value only. math.log10仅期望标量值。

data.apply(lambda x: math.log10(x)) fails for the same reason that data.apply(math.log10) does. data.apply(lambda x: math.log10(x))失败的原因与data.apply(math.log10)相同。 Moreover, if data.apply(func) and data.apply(lambda x: func(x)) were both viable options, the first should be preferred since the lambda function would just make the call a tad slower. 此外,如果data.apply(func)data.apply(lambda x: func(x))都是可行的选项,则应首选第一个选项,因为lambda函数只会使调用速度变慢。

You could use data.apply(np.log10) , again since the NumPy ufunc np.log10 can be applied to Series, but there is no reason to bother doing this when np.log10(data) works. 您可以再次使用data.apply(np.log10) ,因为NumPy np.log10可以应用于Series,但是当np.log10(data)工作时,没有理由去做。

You could also use data.applymap(math.log10) since applymap calls math.log10 on each value in data one-at-a-time. 您还可以使用data.applymap(math.log10)因为applymap math.log10data每个值调用math.log10 But this would be far slower than calling the equivalent NumPy function, np.log10 on the entire DataFrame. 但这比在整个DataFrame上调用等效的NumPy函数np.log10慢得多。 Still, it is worth knowing about applymap in case you need to call some custom function which is not a ufunc. 不过,在需要调用不是applymap某些自定义函数的情况下,仍然值得了解applymap

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM