過濾一組中超過 1 個值的行並計算其出現次數 pandas python

Question

假設，我有以下數據框。

Id   Combinations
1      (A,B)
2      (C,)
3      (A,D)
4      (D,E,F)
5      (F)

我想過濾掉集合中超過值的Combination列值。 像下面的東西。 而且我想在Combination列中計算整體出現的次數。 例如，應刪除ID號2和5 ，因為它們在集合中的值僅為 1。

我正在尋找的結果是：

ID     Combination     Frequency
1        A                2               
1        B                1
3        A                2
3        D                2
4        D                2
4        E                1
4        F                2

任何人都可以幫助在 Python pandas 中獲得上述結果嗎？

Answer 1

如有必要，首先將值轉換為列表：

df['Combinations'] = df['Combinations'].str.strip('(,)').str.split(',')

如果在boolean indexing中僅通過Series.str.len過濾一個值后需要計數，則使用DataFrame.explode並通過Series.map和Series.value_counts :valuescounts 計數值。

df1 = df[df['Combinations'].str.len().gt(1)].explode('Combinations')
df1['Frequency'] = df1['Combinations'].map(df1['Combinations'].value_counts())
print (df1)
   Id Combinations  Frequency
0   1            A          2
0   1            B          1
2   3            A          2
2   3            D          2
3   4            D          2
3   4            E          1
3   4            F          1

或者，如果在刪除它們之前需要計數，則在最后一步中通過Series.duplicated過濾它們：

df2 = df.explode('Combinations')
df2['Frequency'] = df2['Combinations'].map(df2['Combinations'].value_counts())

df2 = df2[df2['Id'].duplicated(keep=False)]

選擇：

df2 = df2[df2.groupby('Id').Id.transform('size') > 1]

或者：

df2 = df2[df2['Id'].map(df2['Id'].value_counts() > 1]

print (df2)
   Id Combinations  Frequency
0   1            A          2
0   1            B          1
2   3            A          2
2   3            D          2
3   4            D          2
3   4            E          1
3   4            F          2

過濾一組中超過 1 個值的行並計算其出現次數 pandas python

問題描述

1 個解決方案

解決方案1
2 已采納 2021-03-03 08:52:10

過濾一組中超過 1 個值的行並計算其出現次數 pandas python

問題描述

1 個解決方案

解決方案1 2 已采納 2021-03-03 08:52:10

解決方案1
2 已采納 2021-03-03 08:52:10