将满足组内条件的行数追加到Pandas数据框

Question

I know how to append a column counting the number of elements in a group , but I need to do so just for the number within that group that meets a certain condition. 我知道如何添加一列来计算组中元素的数量，但是我只需要为满足特定条件的组中的数量添加列。

For example, if I have the following data: 例如，如果我有以下数据：

import numpy as np
import pandas as pd

columns=['group1', 'value1']

data = np.array([np.arange(5)]*2).T
mydf = pd.DataFrame(data, columns=columns)

mydf.group1 = [0,0,1,1,2]
mydf.value1 = ['P','F',100,10,0]

valueslist={'50','51','52','53','54','55','56','57','58','59','60','61','62','63','64','65','66','67','68','69','70','71','72','73','74','75','76','77','78','79','80','81','82','83','84','85','86','87','88','89','90','91','92','93','94','95','96','97','98','99','100','A','B','C','D','P','S'}

and my dataframe therefore looks like this: 因此，我的数据框如下所示：

mydf

  group1 value1 0 0 P 1 0 F 2 1 100 3 1 10 4 2 0

I would then want to count the number of rows within each group1 value where value1 is in valuelist . 然后，我想计算value1在valuelist每个group1值内的行数。

My desired output is: 我想要的输出是：

  group1 value1 count 0 0 P 1 1 0 F 1 2 1 100 1 3 1 10 1 4 2 0 0

Answer 1

After changing the type of the value1 column to match your valueslist (or the other way around), you can use isin to get a True/False column, and convert that to 1s and 0s with astype(int) . 更改value1列的类型以匹配您的valueslist（或相反）后，可以使用isin获取True / False列，并使用astype(int)将其转换为1s和0s。 Then we can apply an ordinary groupby transform: 然后我们可以应用普通的groupby变换：

In [13]: mydf["value1"] = mydf["value1"].astype(str)

In [14]: mydf["count"] = (mydf["value1"].isin(valueslist).astype(int) 
                          .groupby(mydf["group1"]).transform(sum))

In [15]: mydf
Out[15]: 
   group1 value1  count
0       0      P      1
1       0      F      1
2       1    100      1
3       1     10      1
4       2      0      0

Answer 2

You can groupby each group1 and then use transform to find the max of whether your values are in the list. 您可以对每个group1进行分组，然后使用transform查找值是否在列表中的最大值。

mydf['count'] = mydf.groupby('group1').transform(lambda x: x.astype(str).isin(valueslist).sum())

   group1 value1  count
0       0      P      1
1       0      F      1
2       1    100      1
3       1     10      1
4       2      0      0

Answer 3

mydf.value1=mydf.value1.astype(str)
mydf['count']=mydf.group1.map(mydf.groupby('group1').apply(lambda x : sum(x.value1.isin(valueslist))))
mydf
Out[412]: 
   group1 value1  count
0       0      P      1
1       0      F      1
2       1    100      1
3       1     10      1
4       2      0      0

Data input : 数据输入：

valueslist=['50','51','52','53','54','55','56','57','58','59','60','61','62','63','64','65','66','67','68','69','70','71','72','73','74','75','76','77','78','79','80','81','82','83','84','85','86','87','88','89','90','91','92','93','94','95','96','97','98','99','100','A','B','C','D','P','S']

Answer 4

Here is one way to do it, albeit a one-liner: 这是一种方法，尽管只有一种方法：

mydf.merge(mydf.groupby('group1').apply(lambda x: len(set(x['value1'].values).intersection(valueslist))).reset_index().rename(columns={0: 'count'}), how='inner', on='group1')


   group1 value1  count
0       0      P      1
1       0      F      1
2       1    100      1
3       1     10      1
4       2      0      0

将满足组内条件的行数追加到Pandas数据框

问题描述

4 个解决方案

解决方案1
2 已采纳 2017-10-09 15:43:11

解决方案2
1 2017-10-09 15:43:51

解决方案3
1 2017-10-09 15:44:13

解决方案4
0 2017-10-09 15:59:53

将满足组内条件的行数追加到Pandas数据框

问题描述

4 个解决方案

解决方案1 2 已采纳 2017-10-09 15:43:11

解决方案2 1 2017-10-09 15:43:51

解决方案3 1 2017-10-09 15:44:13

解决方案4 0 2017-10-09 15:59:53

解决方案1
2 已采纳 2017-10-09 15:43:11

解决方案2
1 2017-10-09 15:43:51

解决方案3
1 2017-10-09 15:44:13

解决方案4
0 2017-10-09 15:59:53