简体   繁体   English

使用pandas后,为什么我的数据框中出现空行?

[英]Why am I getting an empty row in my dataframe after using pandas apply?

I'm fairly new to Python and Pandas and trying to figure out how to do a simple split-join-apply. 我是Python和Pandas的新手,并试图弄清楚如何进行简单的split-join-apply。 The problem I am having is that I am getting an blank row at the top of all the dataframes I'm getting back from Pandas' apply function and I'm not sure why. 我遇到的问题是我在所有数据帧的顶部得到一个空白行我从Pandas的应用函数回来了,我不知道为什么。 Can anyone explain? 谁能解释一下?

The following is a minimal example that demonstrates the problem, not my actual code: 以下是演示问题的最小示例,而不是我的实际代码:

sorbet = pd.DataFrame({
  'flavour': ['orange', 'orange', 'lemon', 'lemon'],
  'niceosity' : [4, 5, 7, 8]})

def calc_vals(df, target) :
    return pd.Series({'total' : df[target].count(), 'mean' : df[target].mean()})

sorbet_grouped = sorbet.groupby('flavour')
sorbet_vals = sorbet_grouped.apply(calc_vals, target='niceosity')

if I then do print(sorted_vals) I get this output: 如果我然后print(sorted_vals)我得到这个输出:

         mean  total
flavour                 <--- Why are there spaces here?
lemon     7.5      2
orange    4.5      2

[2 rows x 2 columns]

Compare this with print(sorbet) : 将其与print(sorbet)进行比较:

  flavour  niceosity     <--- Note how column names line up
0  orange          4
1  orange          5
2   lemon          7
3   lemon          8

[4 rows x 2 columns]

What is causing this discrepancy and how can I fix it? 造成这种差异的原因是什么?我该如何解决?

The groupby/apply operation returns is a new DataFrame, with a named index. groupby / apply操作返回的是一个具有命名索引的新DataFrame。 The name corresponds to the column name by which the original DataFrame was grouped. 该名称对应于原始DataFrame分组的列名称。

The name shows up above the index. 名称显示在索引上方。 If you reset it to None , then that row disappears: 如果将其重置为“ None ,则该行将消失:

In [155]: sorbet_vals.index.name = None

In [156]: sorbet_vals
Out[156]: 
        mean  total
lemon    7.5      2
orange   4.5      2

[2 rows x 2 columns]

Note that the name is useful -- I don't really recommend removing it. 请注意,该name很有用 - 我不建议删除它。 The name allows you to refer to that index by name rather than merely by number. 该名称允许您通过名称而不仅仅是数字来引用该索引。


If you wish the index to be a column, use reset_index : 如果您希望索引是列,请使用reset_index

In [209]: sorbet_vals.reset_index(inplace=True); sorbet_vals
Out[209]: 
  flavour  mean  total
0   lemon   7.5      2
1  orange   4.5      2

[2 rows x 3 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM