Pandas Data Frame Summary Table

Question

How can I make a summary of a data frame in Pandas, stacking individual operations.

For example, I used the following code:

 df=pd.DataFrame(wb)

# Get list with headers
header1 = list(df)
count=df.count()

NaNs=df.isnull().sum()
sum=df.sum(0)
mean=df.mean()
median=df.median()
min= df.min()
max= df.max()
standardeviation= df.std()
nints=df.dtypes

But I can only print them as individual results. I get something like this for each calculation:

Unnamed: 0                  60
region                      50
IV_bins                     60
N                           60
meanbase                    60
cash                        60
dtype: int64

Finally, I tried creating a summarytable=[] table at the beginning and trying something like summarytable.append(count) and so on with all the calculations, but couldn't get it right. What I am looking for is some table like this, which I believe also involves transposing the calculations:

          A    B 
Count     100  98
NANs      5    7
Mean      10   12.5
Median    14   8
...
Nints     95   96
NStr      5    2

One last thing to take into account. I noticed that for some calculations, like sum() , it doesn't make sense to count strings, so, when I print the results, the strings columns don't print anything. This is the result for print(sum) : (Notice how region doesn't appear)

Unnamed: 0                                                               1830
IV_bins                     [0,2.31e+06](2.31e+06,5.7e+06](5.7e+06,1.07e+0...
N                                                                     3680163
meanbase                                                              3.46248
cash                                                              9.00091e+09

Answer 1

Seems like you may get use out of DataFrame.agg() , with which you can essentially build a customized .describe() output. Here's an example to get you started:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'object': ['a', 'b', 'c'],
                    'numeric': [1, 2, 3],
                    'numeric2': [1.1, 2.5, 50.],
                    'categorical': pd.Categorical(['d','e','f'])
                  })


def nullcounts(ser):
    return ser.isnull().sum()


def custom_describe(frame, func=[nullcounts, 'sum', 'mean', 'median', 'max'],
                    numeric_only=True, **kwargs):
    if numeric_only:
        frame = frame.select_dtypes(include=np.number)
    return frame.agg(func, **kwargs)


custom_describe(df)

            numeric   numeric2
nullcounts      0.0   0.000000
sum             6.0  53.600000
mean            2.0  17.866667
median          2.0   2.500000
max             3.0  50.000000

Answer 2

It seems like there is a library that does exactly that. Check out pandas-summary . For each column, it gives you the count, min,max,std,mean,variance,count of all, count of uniques, missing values, type of column, and much more.

Pandas Data Frame Summary Table

Question

2 answers

solution1
1 ACCPTED 2018-02-20 02:03:31

solution2
1 2019-01-14 12:16:05

Pandas Data Frame Summary Table

Question

2 answers

solution1 1 ACCPTED 2018-02-20 02:03:31

solution2 1 2019-01-14 12:16:05

solution1
1 ACCPTED 2018-02-20 02:03:31

solution2
1 2019-01-14 12:16:05