[英]How do I print entire number in Python from describe() function?
I am doing some statistical work using Python's pandas and I am having the following code to print out the data description (mean, count, median, etc).我正在使用 Python 的 Pandas 做一些统计工作,我有以下代码来打印数据描述(平均值、计数、中位数等)。
data=pandas.read_csv(input_file)
print(data.describe())
But my data is pretty big (around 4 million rows) and each rows has very small data.但是我的数据非常大(大约 400 万行),每一行的数据都非常小。 So inevitably, the count would be big and the mean would be pretty small and thus Python print it like this.
所以不可避免地,计数会很大,平均值会非常小,因此 Python 会像这样打印它。
I just want to print these numbers entirely just for ease of use and understanding, for example it better be 4393476
instead of 4.393476e+06
.我只是想完全打印这些数字只是为了便于使用和理解,例如最好是
4393476
而不是4.393476e+06
。 I have googled it around and the most I can find is Display a float with two decimal places in Python and some other similar posts.我已经用谷歌搜索了它,我能找到的最多的是在 Python和其他一些类似的帖子中显示一个带有两位小数的浮点数。 But that will only work only if I have the numbers in a variable already.
但这只有在我已经在变量中有数字时才有效。 Not in my case though.
但在我的情况下不是。 In my case I haven't got those numbers.
就我而言,我没有这些数字。 The numbers are created by the describe() function, so I don't know what numbers I will get.
这些数字是由 describe() 函数创建的,所以我不知道我会得到什么数字。
Sorry if this seems like a very basic question, I am still new to Python.对不起,如果这似乎是一个非常基本的问题,我对 Python 还是个新手。 Any response is appreaciated.
任何回应都被认可。 Thanks.
谢谢。
Suppose you have the following DataFrame
:假设您有以下
DataFrame
:
I checked the docs and you should probably use the pandas.set_option
API to do this:我检查了文档,您可能应该使用
pandas.set_option
API 来执行此操作:
In [13]: df
Out[13]:
a b c
0 4.405544e+08 1.425305e+08 6.387200e+08
1 8.792502e+08 7.135909e+08 4.652605e+07
2 5.074937e+08 3.008761e+08 1.781351e+08
3 1.188494e+07 7.926714e+08 9.485948e+08
4 6.071372e+08 3.236949e+08 4.464244e+08
5 1.744240e+08 4.062852e+08 4.456160e+08
6 7.622656e+07 9.790510e+08 7.587101e+08
7 8.762620e+08 1.298574e+08 4.487193e+08
8 6.262644e+08 4.648143e+08 5.947500e+08
9 5.951188e+08 9.744804e+08 8.572475e+08
In [14]: pd.set_option('float_format', '{:f}'.format)
In [15]: df
Out[15]:
a b c
0 440554429.333866 142530512.999182 638719977.824965
1 879250168.522411 713590875.479215 46526045.819487
2 507493741.709532 300876106.387427 178135140.583541
3 11884941.851962 792671390.499431 948594814.816647
4 607137206.305609 323694879.619369 446424361.522071
5 174424035.448168 406285189.907148 445616045.754137
6 76226556.685384 979050957.963583 758710090.127867
7 876261954.607558 129857447.076183 448719292.453509
8 626264394.999419 464814260.796770 594750038.747595
9 595118819.308896 974480400.272515 857247528.610996
In [16]: df.describe()
Out[16]:
a b c
count 10.000000 10.000000 10.000000
mean 479461624.877280 522785202.100082 536344333.626082
std 306428177.277935 320806568.078629 284507176.411675
min 11884941.851962 129857447.076183 46526045.819487
25% 240956633.919592 306580799.695412 445818124.696121
50% 551306280.509214 435549725.351959 521734665.600552
75% 621482597.825966 772901261.744377 728712562.052142
max 879250168.522411 979050957.963583 948594814.816647
In [7]: df
Out[7]:
a b c
0 4.405544e+08 1.425305e+08 6.387200e+08
1 8.792502e+08 7.135909e+08 4.652605e+07
2 5.074937e+08 3.008761e+08 1.781351e+08
3 1.188494e+07 7.926714e+08 9.485948e+08
4 6.071372e+08 3.236949e+08 4.464244e+08
5 1.744240e+08 4.062852e+08 4.456160e+08
6 7.622656e+07 9.790510e+08 7.587101e+08
7 8.762620e+08 1.298574e+08 4.487193e+08
8 6.262644e+08 4.648143e+08 5.947500e+08
9 5.951188e+08 9.744804e+08 8.572475e+08
In [8]: df.describe()
Out[8]:
a b c
count 1.000000e+01 1.000000e+01 1.000000e+01
mean 4.794616e+08 5.227852e+08 5.363443e+08
std 3.064282e+08 3.208066e+08 2.845072e+08
min 1.188494e+07 1.298574e+08 4.652605e+07
25% 2.409566e+08 3.065808e+08 4.458181e+08
50% 5.513063e+08 4.355497e+08 5.217347e+08
75% 6.214826e+08 7.729013e+08 7.287126e+08
max 8.792502e+08 9.790510e+08 9.485948e+08
You need to fiddle with the pandas.options.display.float_format
attribute.您需要摆弄
pandas.options.display.float_format
属性。 Note, in my code I've used import pandas as pd
.请注意,在我的代码中,我使用了
import pandas as pd
。 A quick fix is something like:快速修复是这样的:
In [29]: pd.options.display.float_format = "{:.2f}".format
In [10]: df
Out[10]:
a b c
0 440554429.33 142530513.00 638719977.82
1 879250168.52 713590875.48 46526045.82
2 507493741.71 300876106.39 178135140.58
3 11884941.85 792671390.50 948594814.82
4 607137206.31 323694879.62 446424361.52
5 174424035.45 406285189.91 445616045.75
6 76226556.69 979050957.96 758710090.13
7 876261954.61 129857447.08 448719292.45
8 626264395.00 464814260.80 594750038.75
9 595118819.31 974480400.27 857247528.61
In [11]: df.describe()
Out[11]:
a b c
count 10.00 10.00 10.00
mean 479461624.88 522785202.10 536344333.63
std 306428177.28 320806568.08 284507176.41
min 11884941.85 129857447.08 46526045.82
25% 240956633.92 306580799.70 445818124.70
50% 551306280.51 435549725.35 521734665.60
75% 621482597.83 772901261.74 728712562.05
max 879250168.52 979050957.96 948594814.82
import numpy as np
import pandas as pd
np.random.seed(2016)
N = 4393476
df = pd.DataFrame(np.random.uniform(1e-4, 0.1, size=(N,3)), columns=list('ABC'))
desc = df.describe()
desc.loc['count'] = desc.loc['count'].astype(int).astype(str)
desc.iloc[1:] = desc.iloc[1:].applymap('{:.6f}'.format)
print(desc)
yields产量
A B C
count 4393476 4393476 4393476
mean 0.050039 0.050056 0.050057
std 0.028834 0.028836 0.028849
min 0.000100 0.000100 0.000100
25% 0.025076 0.025081 0.025065
50% 0.050047 0.050050 0.050037
75% 0.074987 0.075027 0.075055
max 0.100000 0.100000 0.100000
Under the hood, DataFrames are organized in columns.在引擎盖下,DataFrame 按列组织。 The values in a column can only have one data type (the column's
dtype
).列中的值只能具有一种数据类型(列的
dtype
)。 The DataFrame returned by df.describe()
has columns of floating-point dtype: df.describe()
返回的df.describe()
具有浮点数据类型的列:
In [116]: df.describe().info()
<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, count to max
Data columns (total 3 columns):
A 8 non-null float64
B 8 non-null float64
C 8 non-null float64
dtypes: float64(3)
memory usage: 256.0+ bytes
DataFrames do not allow you to treat one row as integers and the other rows as floats. DataFrames 不允许您将一行视为整数而将其他行视为浮点数。 However, if you change the contents of the DataFrame to strings, then you have full control over the way the values are displayed since all the values are just strings.
但是,如果您将 DataFrame 的内容更改为字符串,则您可以完全控制值的显示方式,因为所有值都只是字符串。
Thus, to create a DataFrame in the desired format, you could use因此,要以所需格式创建 DataFrame,您可以使用
desc.loc['count'] = desc.loc['count'].astype(int).astype(str)
to convert the count
row to integers (by calling astype(int)
), and then convert the integers to strings (by calling astype(str)
).将
count
行转换为整数(通过调用astype(int)
),然后将整数转换为字符串(通过调用astype(str)
)。 Then然后
desc.iloc[1:] = desc.iloc[1:].applymap('{:.6f}'.format)
converts the rest of the floats to strings using thestr.format
method to format the floats to 6 digits after the decimal point.使用
str.format
方法将其余的浮点数转换为字符串,将浮点数格式化为小数点后 6 位数字。
Alternatively, you could use或者,您可以使用
import numpy as np
import pandas as pd
np.random.seed(2016)
N = 4393476
df = pd.DataFrame(np.random.uniform(1e-4, 0.1, size=(N,3)), columns=list('ABC'))
desc = df.describe().T
desc['count'] = desc['count'].astype(int)
print(desc)
which yields这产生
count mean std min 25% 50% 75% max
A 4393476 0.050039 0.028834 0.0001 0.025076 0.050047 0.074987 0.1
B 4393476 0.050056 0.028836 0.0001 0.025081 0.050050 0.075027 0.1
C 4393476 0.050057 0.028849 0.0001 0.025065 0.050037 0.075055 0.1
By transposing the desc
DataFrame, the count
s are now in their own column.通过转置
desc
DataFrame, count
现在位于它们自己的列中。 So now the problem can be solved by converting that column's dtype to int
.所以现在可以通过将该列的 dtype 转换为
int
来解决问题。
One advantage of doing it this way is that the values in desc
remain numerical.这样做的一个优点是
desc
中的值保持数值。 So further calculations based on the numeric values can still be done.因此,仍然可以根据数值进行进一步的计算。
I think this solution is preferrable, provided that the transposed format is acceptable.我认为这种解决方案是可取的,前提是转置格式是可以接受的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.