I have this dataframe:
>>> d = pd.DataFrame(
{ "a": [1,1]
, "b": [2,2]
, "c": [4,5]
, "d": [pd.Timedelta(hours=6),pd.Timedelta(hours=7)]
, "e": [12.1,13.3]
})
>>> d = d.set_index(["a","b","c"])
>>> d
d e
a b c
1 2 4 0 days 06:00:00 12.1
5 0 days 07:00:00 13.3
>>> d.dtypes
d timedelta64[ns]
e float64
dtype: object
I want a sum of each column, and I will need one version with skipna=True
and one version with skipna=False
. I expect this,
>>> d.sum(level=["a","b"])
d e
a b
1 2 0 days 13:00:00 25.4
but I get this.
>>> d.sum(level=["a","b"])
e
a b
1 2 25.4
One column has been dropped.
More info:
>>> pd.__version__
'1.2.3'
>>> sys.version_info
sys.version_info(major=3, minor=8, micro=8, releaselevel='final', serial=0)
groupby
/ agg
d.groupby(level=['a', 'b']).agg({'d': 'sum', 'e': 'sum'})
d e
a b
1 2 0 days 13:00:00 25.4
apply
d.apply(pd.Series.sum, level=['a', 'b'])
d e
a b
1 2 0 days 13:00:00 25.4
Note that you can pass other parameters as well
d.apply(pd.Series.sum, level=['a', 'b'], skipna=True)
d e
a b
1 2 0 days 13:00:00 25.4
groupby
/ numeric_only=False
Per @QuanhHoang
d.groupby(['a', 'b']).sum(numeric_only=False)
d e
a b
1 2 0 days 13:00:00 25.4
Unfortunately, d.sum(level=['a', 'b'], numeric_only=False)
still doesn't work.
Well I think that is strange!
What I think is happening is that Pandas is assuming that it isn't a numeric type and therefore not worthy of 'sum'
.
However, I checked
np.issubdtype(d.dtypes.d, np.number)
True
Sooo /shrug IDK what is going on. I don't feel like looking too deep.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.