简体   繁体   English

如何从 describe() 函数在 Python 中打印整数?

[英]How do I print entire number in Python from describe() function?

I am doing some statistical work using Python's pandas and I am having the following code to print out the data description (mean, count, median, etc).我正在使用 Python 的 Pandas 做一些统计工作,我有以下代码来打印数据描述(平均值、计数、中位数等)。

data=pandas.read_csv(input_file)
print(data.describe())

But my data is pretty big (around 4 million rows) and each rows has very small data.但是我的数据非常大(大约 400 万行),每一行的数据都非常小。 So inevitably, the count would be big and the mean would be pretty small and thus Python print it like this.所以不可避免地,计数会很大,平均值会非常小,因此 Python 会像这样打印它。

在此处输入图片说明

I just want to print these numbers entirely just for ease of use and understanding, for example it better be 4393476 instead of 4.393476e+06 .我只是想完全打印这些数字只是为了便于使用和理解,例如最好是4393476而不是4.393476e+06 I have googled it around and the most I can find is Display a float with two decimal places in Python and some other similar posts.我已经用谷歌搜索了它,我能找到的最多的是在 Python和其他一些类似的帖子中显示一个带有两位小数的浮点数 But that will only work only if I have the numbers in a variable already.但这只有在我已经在变量中有数字时才有效。 Not in my case though.但在我的情况下不是。 In my case I haven't got those numbers.就我而言,我没有这些数字。 The numbers are created by the describe() function, so I don't know what numbers I will get.这些数字是由 describe() 函数创建的,所以我不知道我会得到什么数字。

Sorry if this seems like a very basic question, I am still new to Python.对不起,如果这似乎是一个非常基本的问题,我对 Python 还是个新手。 Any response is appreaciated.任何回应都被认可。 Thanks.谢谢。

Suppose you have the following DataFrame :假设您有以下DataFrame

Edit编辑

I checked the docs and you should probably use the pandas.set_option API to do this:我检查了文档,您可能应该使用pandas.set_option API 来执行此操作:

In [13]: df
Out[13]: 
              a             b             c
0  4.405544e+08  1.425305e+08  6.387200e+08
1  8.792502e+08  7.135909e+08  4.652605e+07
2  5.074937e+08  3.008761e+08  1.781351e+08
3  1.188494e+07  7.926714e+08  9.485948e+08
4  6.071372e+08  3.236949e+08  4.464244e+08
5  1.744240e+08  4.062852e+08  4.456160e+08
6  7.622656e+07  9.790510e+08  7.587101e+08
7  8.762620e+08  1.298574e+08  4.487193e+08
8  6.262644e+08  4.648143e+08  5.947500e+08
9  5.951188e+08  9.744804e+08  8.572475e+08

In [14]: pd.set_option('float_format', '{:f}'.format)

In [15]: df
Out[15]: 
                 a                b                c
0 440554429.333866 142530512.999182 638719977.824965
1 879250168.522411 713590875.479215  46526045.819487
2 507493741.709532 300876106.387427 178135140.583541
3  11884941.851962 792671390.499431 948594814.816647
4 607137206.305609 323694879.619369 446424361.522071
5 174424035.448168 406285189.907148 445616045.754137
6  76226556.685384 979050957.963583 758710090.127867
7 876261954.607558 129857447.076183 448719292.453509
8 626264394.999419 464814260.796770 594750038.747595
9 595118819.308896 974480400.272515 857247528.610996

In [16]: df.describe()
Out[16]: 
                     a                b                c
count        10.000000        10.000000        10.000000
mean  479461624.877280 522785202.100082 536344333.626082
std   306428177.277935 320806568.078629 284507176.411675
min    11884941.851962 129857447.076183  46526045.819487
25%   240956633.919592 306580799.695412 445818124.696121
50%   551306280.509214 435549725.351959 521734665.600552
75%   621482597.825966 772901261.744377 728712562.052142
max   879250168.522411 979050957.963583 948594814.816647

End of edit编辑结束

In [7]: df
Out[7]: 
              a             b             c
0  4.405544e+08  1.425305e+08  6.387200e+08
1  8.792502e+08  7.135909e+08  4.652605e+07
2  5.074937e+08  3.008761e+08  1.781351e+08
3  1.188494e+07  7.926714e+08  9.485948e+08
4  6.071372e+08  3.236949e+08  4.464244e+08
5  1.744240e+08  4.062852e+08  4.456160e+08
6  7.622656e+07  9.790510e+08  7.587101e+08
7  8.762620e+08  1.298574e+08  4.487193e+08
8  6.262644e+08  4.648143e+08  5.947500e+08
9  5.951188e+08  9.744804e+08  8.572475e+08

In [8]: df.describe()
Out[8]: 
                  a             b             c
count  1.000000e+01  1.000000e+01  1.000000e+01
mean   4.794616e+08  5.227852e+08  5.363443e+08
std    3.064282e+08  3.208066e+08  2.845072e+08
min    1.188494e+07  1.298574e+08  4.652605e+07
25%    2.409566e+08  3.065808e+08  4.458181e+08
50%    5.513063e+08  4.355497e+08  5.217347e+08
75%    6.214826e+08  7.729013e+08  7.287126e+08
max    8.792502e+08  9.790510e+08  9.485948e+08

You need to fiddle with the pandas.options.display.float_format attribute.您需要摆弄pandas.options.display.float_format属性。 Note, in my code I've used import pandas as pd .请注意,在我的代码中,我使用了import pandas as pd A quick fix is something like:快速修复是这样的:

In [29]: pd.options.display.float_format = "{:.2f}".format

In [10]: df
Out[10]: 
             a            b            c
0 440554429.33 142530513.00 638719977.82
1 879250168.52 713590875.48  46526045.82
2 507493741.71 300876106.39 178135140.58
3  11884941.85 792671390.50 948594814.82
4 607137206.31 323694879.62 446424361.52
5 174424035.45 406285189.91 445616045.75
6  76226556.69 979050957.96 758710090.13
7 876261954.61 129857447.08 448719292.45
8 626264395.00 464814260.80 594750038.75
9 595118819.31 974480400.27 857247528.61

In [11]: df.describe()
Out[11]: 
                 a            b            c
count        10.00        10.00        10.00
mean  479461624.88 522785202.10 536344333.63
std   306428177.28 320806568.08 284507176.41
min    11884941.85 129857447.08  46526045.82
25%   240956633.92 306580799.70 445818124.70
50%   551306280.51 435549725.35 521734665.60
75%   621482597.83 772901261.74 728712562.05
max   879250168.52 979050957.96 948594814.82
import numpy as np
import pandas as pd
np.random.seed(2016)
N = 4393476
df = pd.DataFrame(np.random.uniform(1e-4, 0.1, size=(N,3)), columns=list('ABC'))

desc = df.describe()
desc.loc['count'] = desc.loc['count'].astype(int).astype(str)
desc.iloc[1:] = desc.iloc[1:].applymap('{:.6f}'.format)
print(desc)

yields产量

              A         B         C
count   4393476   4393476   4393476
mean   0.050039  0.050056  0.050057
std    0.028834  0.028836  0.028849
min    0.000100  0.000100  0.000100
25%    0.025076  0.025081  0.025065
50%    0.050047  0.050050  0.050037
75%    0.074987  0.075027  0.075055
max    0.100000  0.100000  0.100000

Under the hood, DataFrames are organized in columns.在引擎盖下,DataFrame 按列组织。 The values in a column can only have one data type (the column's dtype ).列中的值只能具有一种数据类型(列的dtype )。 The DataFrame returned by df.describe() has columns of floating-point dtype: df.describe()返回的df.describe()具有浮点数据类型的列:

In [116]: df.describe().info()
<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, count to max
Data columns (total 3 columns):
A    8 non-null float64
B    8 non-null float64
C    8 non-null float64
dtypes: float64(3)
memory usage: 256.0+ bytes

DataFrames do not allow you to treat one row as integers and the other rows as floats. DataFrames 不允许您将一行视为整数而将其他行视为浮点数。 However, if you change the contents of the DataFrame to strings, then you have full control over the way the values are displayed since all the values are just strings.但是,如果您将 DataFrame 的内容更改为字符串,则您可以完全控制值的显示方式,因为所有值都只是字符串。

Thus, to create a DataFrame in the desired format, you could use因此,要以所需格式创建 DataFrame,您可以使用

desc.loc['count'] = desc.loc['count'].astype(int).astype(str)

to convert the count row to integers (by calling astype(int) ), and then convert the integers to strings (by calling astype(str) ).count行转换为整数(通过调用astype(int) ),然后将整数转换为字符串(通过调用astype(str) )。 Then然后

desc.iloc[1:] = desc.iloc[1:].applymap('{:.6f}'.format)

converts the rest of the floats to strings using thestr.format method to format the floats to 6 digits after the decimal point.使用str.format方法将其余的浮点数转换为字符串,将浮点数格式化为小数点后 6 位数字。


Alternatively, you could use或者,您可以使用

import numpy as np
import pandas as pd
np.random.seed(2016)
N = 4393476
df = pd.DataFrame(np.random.uniform(1e-4, 0.1, size=(N,3)), columns=list('ABC'))

desc = df.describe().T
desc['count'] = desc['count'].astype(int)
print(desc)

which yields这产生

     count      mean       std     min       25%       50%       75%  max
A  4393476  0.050039  0.028834  0.0001  0.025076  0.050047  0.074987  0.1
B  4393476  0.050056  0.028836  0.0001  0.025081  0.050050  0.075027  0.1
C  4393476  0.050057  0.028849  0.0001  0.025065  0.050037  0.075055  0.1

By transposing the desc DataFrame, the count s are now in their own column.通过转置desc DataFrame, count现在位于它们自己的列中。 So now the problem can be solved by converting that column's dtype to int .所以现在可以通过将该列的 dtype 转换为int来解决问题。

One advantage of doing it this way is that the values in desc remain numerical.这样做的一个优点是desc中的值保持数值。 So further calculations based on the numeric values can still be done.因此,仍然可以根据数值进行进一步的计算。

I think this solution is preferrable, provided that the transposed format is acceptable.我认为这种解决方案是可取的,前提是转置格式是可以接受的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何访问并打印出 Python 中的整个 class 变量? - How do I access and print out an entire class variable in Python? 如何在Python 3中从函数中打印这些返回值? - How do I print these return values from function in Python 3? 强制python打印整个数字 - Force python to print entire number 当用户在 Python 中询问时,我如何回忆整个 function? - How do I recall an entire function when the user asks for it in Python? 如何在python中打印csv文件中的记录总数? - How do I print the total number of records from a csv file in python? 如何在不使用字符串索引或列表的情况下在 Python 中从左到右打印数字的数字? - How do I print the digits of a number from left to right in Python without using string index or list? 如何使用Python从包含特定单词的文件中打印行数? - How do I print the number of lines from a File that contains a specific word using Python? 如何计算csv文件中的特定数据并在python中打印该数字? - How do I count specific data from a csv file and print that number in python? 如何在 Python 中将斐波那契数列打印到第 n 个数字? - How do I print a fibonacci sequence to the nth number in Python? 如何在 python 中打印出电话号码? - How do I print out phone number in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM