简体   繁体   English

在 Pandas groupby object 中获取比率

[英]Getting a ratio in Pandas groupby object

I have a dataframe that looks like this:我有一个 dataframe,看起来像这样:

在此处输入图像描述

I want to create another column called "engaged_percent" for each state which is basically the number of unique engaged_count divided by the user_count of each particular state.我想为每个 state 创建另一个名为“engaged_percent”的列,这基本上是唯一 engaged_count 的数量除以每个特定 state 的 user_count。

I tried doing the following:我尝试执行以下操作:

def f(x):
    engaged_percent = x['engaged_count'].nunique()/x['user_count']
    return pd.Series({'engaged_percent': engaged_percent})

by = df3.groupby(['user_state']).apply(f)
by

But it gave me the following result:但它给了我以下结果:

在此处输入图像描述

What I want is something like this:我想要的是这样的:

user_state        engaged_percent
---------------------------------
California           2/21 = 0.09
Florida              2/7 =  0.28

I think my approach is correct, however I am not sure why my result shows up like the one seen in the second picture.我认为我的方法是正确的,但是我不确定为什么我的结果会像第二张图片中显示的那样。

Any help would be much appreciated!任何帮助将非常感激! Thanks in advance!提前致谢!

How about: 怎么样:

user_count=df3.groupby('user_state')['user_count'].mean()
#(or however you think a value for each state should be calculated)

engaged_unique=df3.groupby('user_state')['engaged_count'].nunique()

engaged_pct=engaged_unique/user_count

(you could also do this in one line in a bunch of different ways) (您也可以通过多种方式在一行中完成此操作)

Your original solution was almost fine except that you were dividing a value by the entire user count series. 您最初的解决方案几乎可以用,只是您将值除以整个user count序列。 So you were getting a Series instead of a value. 因此,您获得的是系列而不是值。 You could try this slight variation: 您可以尝试以下微小变化:

def f(x):
    engaged_percent = x['engaged_count'].nunique()/x['user_count'].mean()
    return engaged_percent

by = df3.groupby(['user_state']).apply(f)
by

I would just use groupby and apply directly 我只会使用groupby并直接apply

df3['engaged_percent'] = df3.groupby('user_state')
                            .apply(lambda s: s.engaged_count.nunique()/s.user_count).values

Demo 演示

>>> df3
    engaged_count  user_count  user_state
0               3          21  California
1               3          21  California
2               3          21  California
...
19              4           7     Florida
20              4           7     Florida
21              4           7     Florida

>>> df3['engaged_percent'] = df3.groupby('user_state').apply(lambda s: s.engaged_count.nunique()/s.user_count).values

>>> df3
    engaged_count  user_count  user_state  engaged_percent
0               3          21  California         0.095238
1               3          21  California         0.095238
2               3          21  California         0.095238
...
19              4           7     Florida         0.285714
20              4           7     Florida         0.285714
21              4           7     Florida         0.285714

titanic.groupby('Sex')['Fare'].mean() titanic.groupby('Sex')['Fare'].mean()

you can try this example just put your example in that你可以试试这个例子只是把你的例子

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM