計算跨多個列的pandas數據幀中唯一值的出現次數

Question

我在熊貓中有以下數據幀

df = pd.DataFrame({'a' : ['hello', 'world', 'great', 'hello'], 'b' : ['world', None, 'hello', 'world'], 'c' : [None, 'hello', 'great', None]})

我想計算所有其他列和列'a'中列'a'中唯一值的出現，並將其保存到數據幀的新列中，並使用適當的命名來獲取列'a'中的值如'hello_count'，'world_count'等。 因此，最終結果將是這樣的

 df = pd.DataFrame({'a' : ['hello', 'world', 'great', 'hello'], 'b' : ['world', None, 'hello', 'world'], 'c' : [None, 'hello', 'great', None], 'hello_count' : [1,1,1,1], 'world_count' : [1,1,0,1], 'great_count' : [0,0,2,0]})

我試過了

df['a', 'b', 'a'].groupby('a').agg(['count])

但那沒用。 任何幫助都非常感謝

Answer 1

讓我們使用pd.get_dummies和groupby ：

(df1.assign(**pd.get_dummies(df1)
                .pipe(lambda x: x.groupby(x.columns.str[2:], axis=1)
                .sum())))

輸出：

       a      b      c  great  hello  world
0  hello  world   None      0      1      1
1  world   None  hello      0      1      1
2  great  hello  great      2      1      0
3  hello  world   None      0      1      1

以下是步驟中的上述解決方案。

第1步：pd.get_dummies

df_gd = pd.get_dummies(df1)
print(df_gd)

   a_great  a_hello  a_world  b_hello  b_world  c_great  c_hello
0        0        1        0        0        1        0        0
1        0        0        1        0        0        0        1
2        1        0        0        1        0        1        0
3        0        1        0        0        1        0        0

第2步：groupby列名忽略前兩個字母

df_gb = df_gd.groupby(df_gd.columns.str[2:], axis=1).sum()
print(df_gb)

   great  hello  world
0      0      1      1
1      0      1      1
2      2      1      0
3      0      1      1

第3步：加入原始數據框

df_out = df1.join(df_gb)
print(df_out)

輸出繼電器：

       a      b      c  great  hello  world
0  hello  world   None      0      1      1
1  world   None  hello      0      1      1
2  great  hello  great      2      1      0
3  hello  world   None      0      1      1

Answer 2

在循環中使用df.apply簡化作業。 然后測試每行中有多少元素與所需字符串相同：

for ss in df.a.unique():
    df[ss+"_count"] = df.apply(lambda row: sum(map(lambda x: x==ss, row)), axis=1)

print(df)

輸出：

       a      b      c  hello_count  world_count  great_count
0  hello  world   None            1            1            0
1  world   None  hello            1            1            0
2  great  hello  great            1            0            2
3  hello  world   None            1            1            0

Answer 3

您可以創建字典d_unique = {}並將所有唯一值作為密鑰對分配給它，考慮名為data_rnr的數據幀：

d_unique={}
for col in data_rnr.columns:
    print(data_rnr[col].name)
    print(len(data_rnr[col].unique()))
    d_unique[data_rnr[col].name]=len(data_rnr[col].unique())

計算跨多個列的pandas數據幀中唯一值的出現次數

問題描述

3 個解決方案

解決方案1
3 已采納 2018-02-02 22:32:00

第1步：pd.get_dummies

第2步：groupby列名忽略前兩個字母

第3步：加入原始數據框

解決方案2
0 2018-02-03 01:33:44

解決方案3
0 2019-08-05 11:40:22

計算跨多個列的pandas數據幀中唯一值的出現次數

問題描述

3 個解決方案

解決方案1 3 已采納 2018-02-02 22:32:00

第1步：pd.get_dummies

第2步：groupby列名忽略前兩個字母

第3步：加入原始數據框

解決方案2 0 2018-02-03 01:33:44

解決方案3 0 2019-08-05 11:40:22

解決方案1
3 已采納 2018-02-02 22:32:00

解決方案2
0 2018-02-03 01:33:44

解決方案3
0 2019-08-05 11:40:22