简体   繁体   English

将列标题添加到新的pandas数据帧

[英]Adding Column Headers to new pandas dataframe

I am creating a new pandas dataframe from a previous dataframe using the .groupby and .size methods. 我从以前的数据框使用创建一个新的数据框熊猫.groupby.size方法。

[in] results = df.groupby(["X", "Y", "Z", "F"]).size()

[out]
    9   27/02/2016  1   N   326
    9   27/02/2016  1   S   332
    9   27/02/2016  2   N   280
    9   27/02/2016  2   S   353
    9   27/02/2016  3   N   177

This behaves as expected, however the result is a dataframe with no column headers. 这表现得如预期,但结果是没有列标题的数据帧。

This SO question states that the following adds column names to the generated dataframe SO问题表明以下内容将列名添加到生成的数据帧中

[in] results.columns = ["X","Y","Z","F","Count"]

However, this does not seem to have any impact at all. 但是,这似乎没有任何影响。

[out]
        9   27/02/2016  1   N   326
        9   27/02/2016  1   S   332
        9   27/02/2016  2   N   280
        9   27/02/2016  2   S   353
        9   27/02/2016  3   N   177

What you're seeing are your grouped columns as the index, if you call reset_index then it restores the column names 你看到的是你的分组列作为索引,如果你调用reset_index然后它恢复列名

so 所以

results = df.groupby(["X", "Y", "Z", "F"]).size()
results.reset_index()

should work 应该管用

In [11]:
df.groupby(["X","Y","Z","F"]).size()

Out[11]:
X  Y           Z  F
9  27/02/2016  1  N    1
                  S    1
               2  N    1
                  S    1
               3  N    1
dtype: int64

In [12]:    
df.groupby(["X","Y","Z","F"]).size().reset_index()

Out[12]:
   X           Y  Z  F  0
0  9  27/02/2016  1  N  1
1  9  27/02/2016  1  S  1
2  9  27/02/2016  2  N  1
3  9  27/02/2016  2  S  1
4  9  27/02/2016  3  N  1

Additionally you can achieve what you want by using count : 此外,您可以使用count来实现您想要的效果:

In [13]:
df.groupby(["X","Y","Z","F"]).count().reset_index()

Out[13]:
   X           Y  Z  F  Count
0  9  27/02/2016  1  N      1
1  9  27/02/2016  1  S      1
2  9  27/02/2016  2  N      1
3  9  27/02/2016  2  S      1
4  9  27/02/2016  3  N      1

You could also pass param as_index=False here: 你也可以在这里传递param as_index=False

In [15]:
df.groupby(["X","Y","Z","F"], as_index=False).count()

Out[15]:
   X           Y  Z  F  Count
0  9  27/02/2016  1  N      1
1  9  27/02/2016  1  S      1
2  9  27/02/2016  2  N      1
3  9  27/02/2016  2  S      1
4  9  27/02/2016  3  N      1

This is normally fine but some aggregate functions will bork if you try to use aggregation methods on columns whose dtypes cannot be aggregated, for instance if you have str dtypes and you decide to call mean for instance. 这通常很好,但是如果你尝试在无法聚合dtypes列上使用聚合方法,那么一些聚合函数将会出现问题,例如,如果你有str dtypes并且你决定调用mean

你可以使用as_index=False .groupby()函数的as_index=False参数:

results = df.groupby(["X", "Y", "Z", "F"], as_index=False).size().rename(columns={0:'Count'})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM