[英]Getting a ratio in Pandas groupby object
I have a dataframe that looks like this:我有一个 dataframe,看起来像这样:
I want to create another column called "engaged_percent" for each state which is basically the number of unique engaged_count divided by the user_count of each particular state.我想为每个 state 创建另一个名为“engaged_percent”的列,这基本上是唯一 engaged_count 的数量除以每个特定 state 的 user_count。
I tried doing the following:我尝试执行以下操作:
def f(x):
engaged_percent = x['engaged_count'].nunique()/x['user_count']
return pd.Series({'engaged_percent': engaged_percent})
by = df3.groupby(['user_state']).apply(f)
by
But it gave me the following result:但它给了我以下结果:
What I want is something like this:我想要的是这样的:
user_state engaged_percent
---------------------------------
California 2/21 = 0.09
Florida 2/7 = 0.28
I think my approach is correct, however I am not sure why my result shows up like the one seen in the second picture.我认为我的方法是正确的,但是我不确定为什么我的结果会像第二张图片中显示的那样。
Any help would be much appreciated!任何帮助将非常感激! Thanks in advance!提前致谢!
How about: 怎么样:
user_count=df3.groupby('user_state')['user_count'].mean()
#(or however you think a value for each state should be calculated)
engaged_unique=df3.groupby('user_state')['engaged_count'].nunique()
engaged_pct=engaged_unique/user_count
(you could also do this in one line in a bunch of different ways) (您也可以通过多种方式在一行中完成此操作)
Your original solution was almost fine except that you were dividing a value by the entire user count
series. 您最初的解决方案几乎可以用,只是您将值除以整个user count
序列。 So you were getting a Series instead of a value. 因此,您获得的是系列而不是值。 You could try this slight variation: 您可以尝试以下微小变化:
def f(x):
engaged_percent = x['engaged_count'].nunique()/x['user_count'].mean()
return engaged_percent
by = df3.groupby(['user_state']).apply(f)
by
I would just use groupby
and apply
directly 我只会使用groupby
并直接apply
df3['engaged_percent'] = df3.groupby('user_state')
.apply(lambda s: s.engaged_count.nunique()/s.user_count).values
Demo 演示
>>> df3
engaged_count user_count user_state
0 3 21 California
1 3 21 California
2 3 21 California
...
19 4 7 Florida
20 4 7 Florida
21 4 7 Florida
>>> df3['engaged_percent'] = df3.groupby('user_state').apply(lambda s: s.engaged_count.nunique()/s.user_count).values
>>> df3
engaged_count user_count user_state engaged_percent
0 3 21 California 0.095238
1 3 21 California 0.095238
2 3 21 California 0.095238
...
19 4 7 Florida 0.285714
20 4 7 Florida 0.285714
21 4 7 Florida 0.285714
titanic.groupby('Sex')['Fare'].mean() titanic.groupby('Sex')['Fare'].mean()
you can try this example just put your example in that你可以试试这个例子只是把你的例子
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.