简体   繁体   English

Pandas 数据框:如何将 describe() 应用于每个组并添加到新列?

[英]Pandas dataframe: how to apply describe() to each group and add to new columns?

df: df:

name score
A      1
A      2
A      3
A      4
A      5
B      2
B      4
B      6 
B      8

Want to get the following new dataframe in the form of below:想要以下面的形式获得以下新数据框:

   name count mean std min 25% 50% 75% max
    A     5    3    .. ..  ..  ..  ..  ..
    B     4    5    .. ..  ..  ..  ..  ..

How to exctract the information from df.describe() and reformat it?如何从 df.describe() 中提取信息并重新格式化? Thanks谢谢

there is even a shorter one :)还有一个更短的:)

print df.groupby('name').describe().unstack(1)

Nothing beats one-liner:没有什么能比得上单线:

In [145]:在 [145] 中:

print df.groupby('name').describe().reset_index().pivot(index='name', values='score', columns='level_1')打印 df.groupby('name').describe().reset_index().pivot(index='name', values='score', columns='level_1')

Define some data定义一些数据

In[1]:
import pandas as pd
import io

data = """
name score
A      1
A      2
A      3
A      4
A      5
B      2
B      4
B      6
B      8
    """

df = pd.read_csv(io.StringIO(data), delimiter='\s+')
print(df)

. .

Out[1]:
  name  score
0    A      1
1    A      2
2    A      3
3    A      4
4    A      5
5    B      2
6    B      4
7    B      6
8    B      8

Solution解决方案

A nice approach to this problem uses a generator expression (see footnote) to allow pd.DataFrame() to iterate over the results of groupby , and construct the summary stats dataframe on the fly:这个问题的一个很好的方法是使用一个生成器表达式(见脚注)来允许pd.DataFrame()迭代groupby的结果,并动态构建汇总统计数据帧:

In[2]:
df2 = pd.DataFrame(group.describe().rename(columns={'score':name}).squeeze()
                         for name, group in df.groupby('name'))

print(df2)

. .

Out[2]:
   count  mean       std  min  25%  50%  75%  max
A      5     3  1.581139    1  2.0    3  4.0    5
B      4     5  2.581989    2  3.5    5  6.5    8

Here the squeeze function is squeezing out a dimension, to convert the one-column group summary stats Dataframe into a Series .在这里, squeeze函数是挤出一个维度,将一列组汇总统计数据Dataframe转换为Series

Footnote : A generator expression has the form my_function(a) for a in iterator , or if iterator gives us back two-element tuples , as in the case of groupby : my_function(a,b) for a,b in iterator脚注:生成器表达式的形式为my_function(a) for a in iterator ,或者如果iterator返回tuples ,如groupby的情况: my_function(a,b) for a,b in iterator

Nothing beats one-liner:没有什么能比得上单线:

In [145]:

print df.groupby('name').describe().reset_index().pivot(index='name', values='score', columns='level_1')

level_1  25%  50%  75%  count  max  mean  min       std
name                                                   
A        2.0    3  4.0      5    5     3    1  1.581139
B        3.5    5  6.5      4    8     5    2  2.581989

use code使用代码

df.groupby('name').describe()

在此处输入图片说明

Table is stored in dataframe named df表存储在名为df数据帧中

df= pd.read_csv(io.StringIO(data),delimiter='\s+')

Just specify column name and describe give you required output.只需指定列名并describe提供所需的输出。 In this way you calculate wrt any column通过这种方式,您可以计算任何列

df.groupby('name')['score'].describe()
import pandas as pd
import io
import numpy as np

data = """
name score
A      1
A      2
A      3
A      4
A      5
B      2
B      4
B      6
B      8
    """

df = pd.read_csv(io.StringIO(data), delimiter='\s+')

df2 = df.groupby('name').describe().reset_index().T.drop('name')
arr = np.array(df2).reshape((4,8))

df2 = pd.DataFrame(arr[1:], index=['name','A','B'])

print(df2)

That will give you df2 as:这会给你 df2 为:

              0     1        2    3    4    5    6    7
    name  count  mean      std  min  25%  50%  75%  max
    A         5     3  1.58114    1    2    3    4    5
    B         4     5  2.58199    2  3.5    5  6.5    8

Well I managed to get what you wanted but it doesn't scale very well.好吧,我设法得到了你想要的东西,但它的伸缩性不是很好。

import pandas as pd

name = ['a','a','a','a','a','b','b','b','b','b']
score = [1,2,3,4,5,2,4,6,8]

d = pd.DataFrame(zip(name,score), columns=['Name','Score'])
d = d.groupby('Name').describe()
d = d.reset_index()
df2 = pd.DataFrame(zip(d.level_1[8:], list(d.Score)[:8], list(d.Score)[8:]), columns = ['Name','A','B']).T

print df2

          0     1         2    3    4    5    6    7
Name  count  mean       std  min  25%  50%  75%  max
A         5     3  1.581139    1    2    3    4    5
B         4     5  2.581989    2  3.5    5  6.5    8

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas dataframe,如何按多列分组并为特定列应用总和并添加新的计数列? - Pandas dataframe, how can I group by multiple columns and apply sum for specific column and add new count column? 将函数应用于pandas数据帧的每一行以创建两个新列 - Apply function to each row of pandas dataframe to create two new columns Pandas dataframe,如何按单列分组并将总和应用于多列并添加新的总和列? - Pandas dataframe, how can I group by single column and apply sum to multiple column and add new sum column? 按组将函数应用于 Pandas 数据框中的每一行 - Apply function to each row in Pandas dataframe by group C 应用用户定义的 function 到 pandas dataframe 特定列并将新列添加到 Z6A8064B53DF47945555707 - apply user defined function to pandas dataframe specific columns and add new columns to dataframe 使用Pandas DataFrame,如何按多列分组并添加新列 - Using pandas dataframe, how to group by multiple columns and adding new column 如何按0级分组并在多索引和级数据框(pandas)中进行描述? - How to group by level 0 and describe in a multi index and level dataframe (pandas)? 如何为 pandas dataframe 中的每一组添加一行? - How can I add one row for each group in pandas dataframe? 使用 pandas dataframe 如何按 id 添加和乘以列月份组 - using pandas dataframe how to add & multiply columns month group by id Pandas:将自定义 function 应用于组并将结果存储在每个组的新列中 - Pandas: Apply custom function to groups and store result in new columns in each group
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM