简体   繁体   中英

pandas groupby-apply behavior, returning a Series (inconsistent output type)

I'm curious about the behavior of pandas groupby-apply when the apply function returns a series.

When the series are of different lengths, it returns a multi-indexed series.

In [1]: import pandas as pd

In [2]: df1=pd.DataFrame({'state':list("AABBB"),
   ...:                 'city':list("vwxyz")})

In [3]: df1
Out[3]:
  city state
0    v     A
1    w     A
2    x     B
3    y     B
4    z     B

In [4]: def f(x):
   ...:         return pd.Series(x['city'].values,index=range(len(x)))
   ...:

In [5]: df1.groupby('state').apply(f)
Out[5]:
state
A      0    v
       1    w
B      0    x
       1    y
       2    z
dtype: object

This returns aa Series object.

However, if every series has the same length, then it pivots this into a DataFrame .

In [6]: df2=pd.DataFrame({'state':list("AAABBB"),
   ...:                 'city':list("uvwxyz")})

In [7]: df2
Out[7]:
  city state
0    u     A
1    v     A
2    w     A
3    x     B
4    y     B
5    z     B

In [8]: df2.groupby('state').apply(f)
Out[8]:
       0  1  2
state
A      u  v  w
B      x  y  z

Is this really the intended behavior? Are we meant to check the return type if we use apply this way? Or is there an option in apply that I'm not appreciating?

In case you're curious, in my actual use case, the returned Series will be the same length as the length of the group. It seems like an ideal case for transform except that I've found that apply with returning a Series is actually an order of magnitude faster on a large dataset. That can be another topic.

Edit: Loosely based on the Parfait's answer, we can certainly do this:

X=df.groupby('state').apply(f)
if not isinstance(X,pd.Series):
    X=X.stack()
X

That will give the same output type for either df=df1 or df=df2 . I guess I'm just asking if this is really the normal or preferred way to handle this.

In essence, a dataframe consists of equal-length series (technically a dictionary container of Series objects). As stated in the pandas split-apply-combine docs, running a groupby() refers to one or more of the following

  • Splitting the data into groups based on some criteria
  • Applying a function to each group independently
  • Combining the results into a data structure

Notice this does not state a data frame is always produced, but a generalized data structure . So a groupby() operation can downcast to a Series, or if given a Series as input, can upcast to dataframe.

For your first dataframe, you run unequal groupings (or unequal index lengths) coercing a series return which in the "combine" processing does not adequately yield a data frame. Since a data frame cannot combine different length series it instead yields a multi-index series. You can see this with print statements in the defined function with the state==A group having length 2 and B group length 3.

def f(x):
    print(x)
    return pd.Series(x['city'].values, index=range(len(x)))

s1 = df1.groupby('state').apply(f)

print(s1)
#   city state
# 0    v     A
# 1    w     A
#   city state
# 0    v     A
# 1    w     A
#   city state
# 2    x     B
# 3    y     B
# 4    z     B
# state   
# A      0    v
#        1    w
# B      0    x
#        1    y
#        2    z
# dtype: object

However, you can manipulate the multi-index series outcome by resetting index and thereby adjusting its hierarchical levels:

df = df1.groupby('state').apply(f).reset_index()
print(df)

#   state  level_1  0
# 0     A        0  v
# 1     A        1  w
# 2     B        0  x
# 3     B        1  y
# 4     B        2  z

But more relevant to your needs is unstack() which pivots a level of the index labels, yielding a data frame. Consider fillna() to fill the None outcome.

df = df1.groupby('state').apply(f).unstack()
print(df)

#        0  1     2
# state            
# A      v  w  None
# B      x  y     z

而不是在函数f中执行index=range(len(x)) ,你可以使用index=x.index来防止这种不期望的行为

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM