计算列中的唯一值-Pandas Python

Question

Here is my dataset: 这是我的数据集：

Unique_ID   No_of_Filings   Req_1   Req_2   Req_3   Req_4
 RCONF045   3               Blue    Red     White   Violet
 RCONF046   3               Blue    Red     White   Brown
 RCONF047   3               Blue    Red     White   Brown
 RCONF048   3               Black   Yellow  Green   N/A
 RCONF051   4               Black   Yellow  Green   N/A
 RCONF052   4               Black   Brown   Green   Orange

I've extracted the unique values from the last 4 columns (Req_1 through Req_4) by the following: 我通过以下方式从最后4列（从Req_1到Req_4）中提取了唯一值：

pd.unique(df1[["Req_1","Req_2","Req_3","Req_4"]].values.ravel("K"))

Out[20]:  array(['Blue', 'Black', 'Red', 'Yellow', 'Brown', 'White', 'Green',
       'Violet', nan, 'Orange'], dtype=object)

Here's what I need for the output. 这是我需要的输出。 Frequency = how many times it shows up in the last four columns (eg Yellow only shows up twice) and Number of Filings = sum(No_of_Filings if the requirement is in that row ). 频率=它在最后四列中显示的次数（例如，黄色仅显示两次），并且申请数=总和（ 如果要求在该行中，则为No_of_Filings）。 For example, Blue is in the first three rows, so that's 3 + 3 + 3 = 9 and Brown is in the second, third, and sixth row, so it's 3 + 3 + 4 = 10 例如，蓝色在前三行中，所以是3 + 3 + 3 = 9，而布朗在第二，第三和第六行中，所以它是3 + 3 + 4 = 10

Requirements    Frequency   Number of Filings
   Blue            3              9
   Black           3              11
   Red             3              9
   Brown           3              10
   White           3              9
   Green           3              11
   Yellow          2              7
   N/A             2              7
   Violet          1              3
   Orange          1              4

How can I create those two columns in my newly-created dataframe above using pandas? 如何使用熊猫在上面新创建的数据框中创建这两列？

Thanks 谢谢

Answer 1

You could do something along these lines using agg , but it takes some re-shaping beforehand. 您可以使用agg在这些方面做一些事情，但是需要事先进行一些重新设置。 Here is a way to get it: 这是一种获取方法：

agg_df = (df.fillna('N/A').set_index(['Unique_ID', 'No_of_Filings'])
          .stack()
          .reset_index('No_of_Filings')
          .groupby(0)
          .agg(['sum', 'size'])
          .reset_index())

agg_df.columns = ['Requirements', 'Number of Filings', 'Frequency']

>>> agg_df
  Requirements  Number of Filings  Frequency
0        Black                 11          3
1         Blue                  9          3
2        Brown                 10          3
3        Green                 11          3
4          N/A                  7          2
5       Orange                  4          1
6          Red                  9          3
7       Violet                  3          1
8        White                  9          3
9       Yellow                  7          2

Answer 2

from collections import defaultdict

d = defaultdict(int)
for i, f, *r in df.values:
    for v in r:
        d[(v, 'filings')] += f
        d[(v, 'frequency')] += 1

pd.Series(d).unstack().rename_axis('reqs').reset_index()

     reqs  filings  frequency
0     NaN        7          2
1   Black       11          3
2    Blue        9          3
3   Brown       10          3
4   Green       11          3
5  Orange        4          1
6     Red        9          3
7  Violet        3          1
8   White        9          3
9  Yellow        7          2

Answer 3

Notice that you can use np.unique to get the count of the uniques right away (which already completes one of your objectives). 请注意，您可以使用np.unique立即获取唯一性计数（已经完成了您的目标之一）。

df= df.fillna('NA')
cols = ["Req_1","Req_2","Req_3","Req_4"]

u = pd.unique(df[cols].values.ravel("K"))
s = np.unique(df[cols].values.ravel("K"), return_counts=True)

df2 = pd.DataFrame({'colors': u}).fillna('N/A')
df2['freq'] = df2.colors.map(dict(zip(*s)))
df2['n'] = [df[(df[cols] == v).sum(1) >=1].No_of_Filings.sum() for v in df2.colors]

    colors  freq    n
0   Blue    3   9
1   Black   3   11
2   Red     3   9
3   Yellow  2   7
4   Brown   3   10
5   White   3   9
6   Green   3   11
7   Violet  1   3
8   N/A     2   0
9   Orange  1   4

Answer 4

IIUC 联合会

df.drop('Unique_ID',1).melt('No_of_Filings').\
   fillna('NaN').groupby('value')['No_of_Filings'].agg(['count','sum'])
Out[56]: 
        count  sum
value             
Black       3   11
Blue        3    9
Brown       3   10
Green       3   11
NaN         2    7
Orange      1    4
Red         3    9
Violet      1    3
White       3    9
Yellow      2    7

计算列中的唯一值-Pandas Python

问题描述

4 个解决方案

解决方案1
2 已采纳 2018-08-01 23:29:24

解决方案2
2 2018-08-01 23:30:44

解决方案3
2 2018-08-01 23:31:17

解决方案4
2 2018-08-02 00:02:25

计算列中的唯一值-Pandas Python

问题描述

4 个解决方案

解决方案1 2 已采纳 2018-08-01 23:29:24

解决方案2 2 2018-08-01 23:30:44

解决方案3 2 2018-08-01 23:31:17

解决方案4 2 2018-08-02 00:02:25

解决方案1
2 已采纳 2018-08-01 23:29:24

解决方案2
2 2018-08-01 23:30:44

解决方案3
2 2018-08-01 23:31:17

解决方案4
2 2018-08-02 00:02:25