如何从 describe() 函数在 Python 中打印整数？

Question

I am doing some statistical work using Python's pandas and I am having the following code to print out the data description (mean, count, median, etc).我正在使用 Python 的 Pandas 做一些统计工作，我有以下代码来打印数据描述（平均值、计数、中位数等）。

data=pandas.read_csv(input_file)
print(data.describe())

But my data is pretty big (around 4 million rows) and each rows has very small data.但是我的数据非常大（大约 400 万行），每一行的数据都非常小。 So inevitably, the count would be big and the mean would be pretty small and thus Python print it like this.所以不可避免地，计数会很大，平均值会非常小，因此 Python 会像这样打印它。

I just want to print these numbers entirely just for ease of use and understanding, for example it better be 4393476 instead of 4.393476e+06 .我只是想完全打印这些数字只是为了便于使用和理解，例如最好是4393476而不是4.393476e+06 。 I have googled it around and the most I can find is Display a float with two decimal places in Python and some other similar posts.我已经用谷歌搜索了它，我能找到的最多的是在 Python和其他一些类似的帖子中显示一个带有两位小数的浮点数。 But that will only work only if I have the numbers in a variable already.但这只有在我已经在变量中有数字时才有效。 Not in my case though.但在我的情况下不是。 In my case I haven't got those numbers.就我而言，我没有这些数字。 The numbers are created by the describe() function, so I don't know what numbers I will get.这些数字是由 describe() 函数创建的，所以我不知道我会得到什么数字。

Sorry if this seems like a very basic question, I am still new to Python.对不起，如果这似乎是一个非常基本的问题，我对 Python 还是个新手。 Any response is appreaciated.任何回应都被认可。 Thanks.谢谢。

Answer 1

Suppose you have the following DataFrame :假设您有以下DataFrame ：

Edit编辑

I checked the docs and you should probably use the pandas.set_option API to do this:我检查了文档，您可能应该使用pandas.set_option API 来执行此操作：

In [13]: df
Out[13]: 
              a             b             c
0  4.405544e+08  1.425305e+08  6.387200e+08
1  8.792502e+08  7.135909e+08  4.652605e+07
2  5.074937e+08  3.008761e+08  1.781351e+08
3  1.188494e+07  7.926714e+08  9.485948e+08
4  6.071372e+08  3.236949e+08  4.464244e+08
5  1.744240e+08  4.062852e+08  4.456160e+08
6  7.622656e+07  9.790510e+08  7.587101e+08
7  8.762620e+08  1.298574e+08  4.487193e+08
8  6.262644e+08  4.648143e+08  5.947500e+08
9  5.951188e+08  9.744804e+08  8.572475e+08

In [14]: pd.set_option('float_format', '{:f}'.format)

In [15]: df
Out[15]: 
                 a                b                c
0 440554429.333866 142530512.999182 638719977.824965
1 879250168.522411 713590875.479215  46526045.819487
2 507493741.709532 300876106.387427 178135140.583541
3  11884941.851962 792671390.499431 948594814.816647
4 607137206.305609 323694879.619369 446424361.522071
5 174424035.448168 406285189.907148 445616045.754137
6  76226556.685384 979050957.963583 758710090.127867
7 876261954.607558 129857447.076183 448719292.453509
8 626264394.999419 464814260.796770 594750038.747595
9 595118819.308896 974480400.272515 857247528.610996

In [16]: df.describe()
Out[16]: 
                     a                b                c
count        10.000000        10.000000        10.000000
mean  479461624.877280 522785202.100082 536344333.626082
std   306428177.277935 320806568.078629 284507176.411675
min    11884941.851962 129857447.076183  46526045.819487
25%   240956633.919592 306580799.695412 445818124.696121
50%   551306280.509214 435549725.351959 521734665.600552
75%   621482597.825966 772901261.744377 728712562.052142
max   879250168.522411 979050957.963583 948594814.816647

End of edit编辑结束

In [7]: df
Out[7]: 
              a             b             c
0  4.405544e+08  1.425305e+08  6.387200e+08
1  8.792502e+08  7.135909e+08  4.652605e+07
2  5.074937e+08  3.008761e+08  1.781351e+08
3  1.188494e+07  7.926714e+08  9.485948e+08
4  6.071372e+08  3.236949e+08  4.464244e+08
5  1.744240e+08  4.062852e+08  4.456160e+08
6  7.622656e+07  9.790510e+08  7.587101e+08
7  8.762620e+08  1.298574e+08  4.487193e+08
8  6.262644e+08  4.648143e+08  5.947500e+08
9  5.951188e+08  9.744804e+08  8.572475e+08

In [8]: df.describe()
Out[8]: 
                  a             b             c
count  1.000000e+01  1.000000e+01  1.000000e+01
mean   4.794616e+08  5.227852e+08  5.363443e+08
std    3.064282e+08  3.208066e+08  2.845072e+08
min    1.188494e+07  1.298574e+08  4.652605e+07
25%    2.409566e+08  3.065808e+08  4.458181e+08
50%    5.513063e+08  4.355497e+08  5.217347e+08
75%    6.214826e+08  7.729013e+08  7.287126e+08
max    8.792502e+08  9.790510e+08  9.485948e+08

You need to fiddle with the pandas.options.display.float_format attribute.您需要摆弄pandas.options.display.float_format属性。 Note, in my code I've used import pandas as pd .请注意，在我的代码中，我使用了import pandas as pd 。 A quick fix is something like:快速修复是这样的：

In [29]: pd.options.display.float_format = "{:.2f}".format

In [10]: df
Out[10]: 
             a            b            c
0 440554429.33 142530513.00 638719977.82
1 879250168.52 713590875.48  46526045.82
2 507493741.71 300876106.39 178135140.58
3  11884941.85 792671390.50 948594814.82
4 607137206.31 323694879.62 446424361.52
5 174424035.45 406285189.91 445616045.75
6  76226556.69 979050957.96 758710090.13
7 876261954.61 129857447.08 448719292.45
8 626264395.00 464814260.80 594750038.75
9 595118819.31 974480400.27 857247528.61

In [11]: df.describe()
Out[11]: 
                 a            b            c
count        10.00        10.00        10.00
mean  479461624.88 522785202.10 536344333.63
std   306428177.28 320806568.08 284507176.41
min    11884941.85 129857447.08  46526045.82
25%   240956633.92 306580799.70 445818124.70
50%   551306280.51 435549725.35 521734665.60
75%   621482597.83 772901261.74 728712562.05
max   879250168.52 979050957.96 948594814.82

Answer 2

import numpy as np
import pandas as pd
np.random.seed(2016)
N = 4393476
df = pd.DataFrame(np.random.uniform(1e-4, 0.1, size=(N,3)), columns=list('ABC'))

desc = df.describe()
desc.loc['count'] = desc.loc['count'].astype(int).astype(str)
desc.iloc[1:] = desc.iloc[1:].applymap('{:.6f}'.format)
print(desc)

yields产量

              A         B         C
count   4393476   4393476   4393476
mean   0.050039  0.050056  0.050057
std    0.028834  0.028836  0.028849
min    0.000100  0.000100  0.000100
25%    0.025076  0.025081  0.025065
50%    0.050047  0.050050  0.050037
75%    0.074987  0.075027  0.075055
max    0.100000  0.100000  0.100000

Under the hood, DataFrames are organized in columns.在引擎盖下，DataFrame 按列组织。 The values in a column can only have one data type (the column's dtype ).列中的值只能具有一种数据类型（列的dtype ）。 The DataFrame returned by df.describe() has columns of floating-point dtype: df.describe()返回的df.describe()具有浮点数据类型的列：

In [116]: df.describe().info()
<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, count to max
Data columns (total 3 columns):
A    8 non-null float64
B    8 non-null float64
C    8 non-null float64
dtypes: float64(3)
memory usage: 256.0+ bytes

DataFrames do not allow you to treat one row as integers and the other rows as floats. DataFrames 不允许您将一行视为整数而将其他行视为浮点数。 However, if you change the contents of the DataFrame to strings, then you have full control over the way the values are displayed since all the values are just strings.但是，如果您将 DataFrame 的内容更改为字符串，则您可以完全控制值的显示方式，因为所有值都只是字符串。

Thus, to create a DataFrame in the desired format, you could use因此，要以所需格式创建 DataFrame，您可以使用

desc.loc['count'] = desc.loc['count'].astype(int).astype(str)

to convert the count row to integers (by calling astype(int) ), and then convert the integers to strings (by calling astype(str) ).将count行转换为整数（通过调用astype(int) ），然后将整数转换为字符串（通过调用astype(str) ）。 Then然后

desc.iloc[1:] = desc.iloc[1:].applymap('{:.6f}'.format)

converts the rest of the floats to strings using thestr.format method to format the floats to 6 digits after the decimal point.使用str.format方法将其余的浮点数转换为字符串，将浮点数格式化为小数点后 6 位数字。

Alternatively, you could use或者，您可以使用

import numpy as np
import pandas as pd
np.random.seed(2016)
N = 4393476
df = pd.DataFrame(np.random.uniform(1e-4, 0.1, size=(N,3)), columns=list('ABC'))

desc = df.describe().T
desc['count'] = desc['count'].astype(int)
print(desc)

which yields这产生

     count      mean       std     min       25%       50%       75%  max
A  4393476  0.050039  0.028834  0.0001  0.025076  0.050047  0.074987  0.1
B  4393476  0.050056  0.028836  0.0001  0.025081  0.050050  0.075027  0.1
C  4393476  0.050057  0.028849  0.0001  0.025065  0.050037  0.075055  0.1

By transposing the desc DataFrame, the count s are now in their own column.通过转置desc DataFrame， count现在位于它们自己的列中。 So now the problem can be solved by converting that column's dtype to int .所以现在可以通过将该列的 dtype 转换为int来解决问题。

One advantage of doing it this way is that the values in desc remain numerical.这样做的一个优点是desc中的值保持数值。 So further calculations based on the numeric values can still be done.因此，仍然可以根据数值进行进一步的计算。

I think this solution is preferrable, provided that the transposed format is acceptable.我认为这种解决方案是可取的，前提是转置格式是可以接受的。

如何从 describe() 函数在 Python 中打印整数？

问题描述

2 个解决方案

解决方案1
59 已采纳 2016-12-26 09:15:44

Edit编辑

End of edit编辑结束

解决方案2
8 2016-12-26 09:54:52

如何从 describe() 函数在 Python 中打印整数？

问题描述

2 个解决方案

解决方案1 59 已采纳 2016-12-26 09:15:44

Edit编辑

End of edit编辑结束

解决方案2 8 2016-12-26 09:54:52

解决方案1
59 已采纳 2016-12-26 09:15:44

解决方案2
8 2016-12-26 09:54:52