How can I make a summary of a data frame in Pandas, stacking individual operations.
For example, I used the following code:
df=pd.DataFrame(wb)
# Get list with headers
header1 = list(df)
count=df.count()
NaNs=df.isnull().sum()
sum=df.sum(0)
mean=df.mean()
median=df.median()
min= df.min()
max= df.max()
standardeviation= df.std()
nints=df.dtypes
But I can only print them as individual results. I get something like this for each calculation:
Unnamed: 0 60
region 50
IV_bins 60
N 60
meanbase 60
cash 60
dtype: int64
Finally, I tried creating a summarytable=[]
table at the beginning and trying something like summarytable.append(count)
and so on with all the calculations, but couldn't get it right. What I am looking for is some table like this, which I believe also involves transposing the calculations:
A B
Count 100 98
NANs 5 7
Mean 10 12.5
Median 14 8
...
Nints 95 96
NStr 5 2
One last thing to take into account. I noticed that for some calculations, like sum()
, it doesn't make sense to count strings, so, when I print the results, the strings columns don't print anything. This is the result for print(sum)
: (Notice how region doesn't appear)
Unnamed: 0 1830
IV_bins [0,2.31e+06](2.31e+06,5.7e+06](5.7e+06,1.07e+0...
N 3680163
meanbase 3.46248
cash 9.00091e+09
Seems like you may get use out of DataFrame.agg()
, with which you can essentially build a customized .describe()
output. Here's an example to get you started:
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'object': ['a', 'b', 'c'],
'numeric': [1, 2, 3],
'numeric2': [1.1, 2.5, 50.],
'categorical': pd.Categorical(['d','e','f'])
})
def nullcounts(ser):
return ser.isnull().sum()
def custom_describe(frame, func=[nullcounts, 'sum', 'mean', 'median', 'max'],
numeric_only=True, **kwargs):
if numeric_only:
frame = frame.select_dtypes(include=np.number)
return frame.agg(func, **kwargs)
custom_describe(df)
numeric numeric2
nullcounts 0.0 0.000000
sum 6.0 53.600000
mean 2.0 17.866667
median 2.0 2.500000
max 3.0 50.000000
It seems like there is a library that does exactly that. Check out pandas-summary . For each column, it gives you the count, min,max,std,mean,variance,count of all, count of uniques, missing values, type of column, and much more.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.