在 Pandas groupby object 中获取比率

Question

I have a dataframe that looks like this:我有一个 dataframe，看起来像这样：

I want to create another column called "engaged_percent" for each state which is basically the number of unique engaged_count divided by the user_count of each particular state.我想为每个 state 创建另一个名为“engaged_percent”的列，这基本上是唯一 engaged_count 的数量除以每个特定 state 的 user_count。

I tried doing the following:我尝试执行以下操作：

def f(x):
    engaged_percent = x['engaged_count'].nunique()/x['user_count']
    return pd.Series({'engaged_percent': engaged_percent})

by = df3.groupby(['user_state']).apply(f)
by

But it gave me the following result:但它给了我以下结果：

What I want is something like this:我想要的是这样的：

user_state        engaged_percent
---------------------------------
California           2/21 = 0.09
Florida              2/7 =  0.28

I think my approach is correct, however I am not sure why my result shows up like the one seen in the second picture.我认为我的方法是正确的，但是我不确定为什么我的结果会像第二张图片中显示的那样。

Any help would be much appreciated!任何帮助将非常感激！ Thanks in advance!提前致谢！

Answer 1

How about: 怎么样：

user_count=df3.groupby('user_state')['user_count'].mean()
#(or however you think a value for each state should be calculated)

engaged_unique=df3.groupby('user_state')['engaged_count'].nunique()

engaged_pct=engaged_unique/user_count

(you could also do this in one line in a bunch of different ways) （您也可以通过多种方式在一行中完成此操作）

Your original solution was almost fine except that you were dividing a value by the entire user count series. 您最初的解决方案几乎可以用，只是您将值除以整个user count序列。 So you were getting a Series instead of a value. 因此，您获得的是系列而不是值。 You could try this slight variation: 您可以尝试以下微小变化：

def f(x):
    engaged_percent = x['engaged_count'].nunique()/x['user_count'].mean()
    return engaged_percent

by = df3.groupby(['user_state']).apply(f)
by

Answer 2

I would just use groupby and apply directly 我只会使用groupby并直接apply

df3['engaged_percent'] = df3.groupby('user_state')
                            .apply(lambda s: s.engaged_count.nunique()/s.user_count).values

Demo 演示

>>> df3
    engaged_count  user_count  user_state
0               3          21  California
1               3          21  California
2               3          21  California
...
19              4           7     Florida
20              4           7     Florida
21              4           7     Florida

>>> df3['engaged_percent'] = df3.groupby('user_state').apply(lambda s: s.engaged_count.nunique()/s.user_count).values

>>> df3
    engaged_count  user_count  user_state  engaged_percent
0               3          21  California         0.095238
1               3          21  California         0.095238
2               3          21  California         0.095238
...
19              4           7     Florida         0.285714
20              4           7     Florida         0.285714
21              4           7     Florida         0.285714

Answer 3

titanic.groupby('Sex')['Fare'].mean() titanic.groupby('Sex')['Fare'].mean()

you can try this example just put your example in that你可以试试这个例子只是把你的例子

在 Pandas groupby object 中获取比率

问题描述

3 个解决方案

解决方案1
3 已采纳 2017-02-19 02:31:20

解决方案2
1 2017-02-19 02:01:53

解决方案3
0 2022-12-29 04:25:00

在 Pandas groupby object 中获取比率

问题描述

3 个解决方案

解决方案1 3 已采纳 2017-02-19 02:31:20

解决方案2 1 2017-02-19 02:01:53

解决方案3 0 2022-12-29 04:25:00

解决方案1
3 已采纳 2017-02-19 02:31:20

解决方案2
1 2017-02-19 02:01:53

解决方案3
0 2022-12-29 04:25:00