I have some code that first selects data based on a certain criteria then it does a groupby-apply on a Pandas dataframe. Occasionally, the data only has 1 group that matches the criteria. In this case, Pandas will return a row vector rather than a column vector. Example below:
In [50]: x = pd.DataFrame([(round(i/2, 0), i, i) for i in range(0, 10)], column
...: s=['a', 'b', 'c'])
In [51]: x
Out[51]:
a b c
0 0.0 0 0
1 0.0 1 1
2 1.0 2 2
3 2.0 3 3
4 2.0 4 4
5 2.0 5 5
6 3.0 6 6
7 4.0 7 7
8 4.0 8 8
9 4.0 9 9
In [52]: y = x.loc[x.a == 0.0].groupby('a').apply(lambda x: x.b / x.c)
In [53]: y
Out[53]:
0 1
a
0.0 NaN 1.0
y in the above example is a row vector with datatype pandas.DataFrame. If the .loc selection has two or more classes, it will produce a column vector.
In [54]: y = x.loc[x.a <= 1.0].groupby('a').apply(lambda x: x.b / x.c)
In [55]: y
Out[55]:
a
0.0 0 NaN
1 1.0
1.0 2 1.0
dtype: float64
Any idea how I can make the two behaviour consistent? Ultimately, the column vector is what I want.
Thanks
There's no way to do this in one step, unfortunately. You can, however, do this in two steps, by querying ngroups
and reshaping your result accordingly.
g = x.loc[...].groupby('a')
y = g.apply(lambda x: x.b / x.c)
if g.ngroups == 1:
y = y.T
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.