I have a question regarding pandas dataframes:
I have a dataframe like the following,
df = pd.DataFrame([[1,1,10],[1,1,30],[1,2,40],[2,3,50],[2,3,150],[2,4,100]],columns=["a","b","c"])
a b c
0 1 1 10
1 1 1 30
2 1 2 40
3 2 3 50
4 2 3 150
5 2 4 100
And i want to produce the following output,
a "new col"
0 1 30
1 2 100
where the first line is calculated as the following:
I can imagine that this is somehow confusing, but I hope this is understandable, nevertheless.
I achieved the desired result, but as i need it for a huge dataframe, my solution is probably much to slow,
pd.DataFrame([ [a, adata.groupby("b").agg({"c": lambda x:x.mean()}).mean()[0]] for a,adata in df.groupby("a") ],columns=["a","new col"])
a new col
0 1 30.0
1 2 100.0
Therefore, what I would need is something like (?) df.groupby("a").groupby("b")["c"].mean()
Thank you very much in advance!
Here's one way
In [101]: (df.groupby(['a', 'b'], as_index=False)['c'].mean()
.groupby('a', as_index=False)['c'].mean()
.rename(columns={'c': 'new col'}))
Out[101]:
a new col
0 1 30
1 2 100
In [57]: df.groupby(['a','b'])['c'].mean().mean(level=0).reset_index()
Out[57]:
a c
0 1 30
1 2 100
df.groupby(['a','b']).mean().reset_index().groupby('a').mean()
Out[117]:
b c
a
1 1.5 30.0
2 3.5 100.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.