計算帶有條件的熊貓數據框中出現的總數

Question

我有這個數據框：

cat_df.head()

   category depth
0   food    0.0
1   food    1.0
2   sport   1.0
3   food    3.0
4   school  0.0
5   school  0.0
6   school  1.0
...

depth = 0表示根發布， depth > 0是注釋。

對於每個類別，我想計算根發布的數量（ depth=0 ）和評論的數量（ depth>0 ）。

我使用value_counts()來計算唯一值：

cat_df['category'].value_counts().head(15)

category     total number 
food         44062
sport        38004
school       11080
life         8810
...

我以為我可以將['depth'] == 0作為條件放入數據框，但這沒有用：

cat_df[cat_df['depth'] == 0].value_counts().head(5)

如何獲得深度= 0和深度> 0的總發生次數？

我想將其放在這樣的表中：

category | total number | depth=0 | depth>0 
...

Answer 1

您只能使用一個groupby來提高性能：

df = (cat_df['depth'].ne(0)
                     .groupby(cat_df['category'])
                     .value_counts()
                     .unstack(fill_value=0)
                     .rename(columns={0:'depth=0', 1:'depth>0'})
                     .assign(total=lambda x: x.sum(axis=1))
                     .reindex(columns=['total','depth=0','depth>0']))

print (df)
depth     total  depth=0  depth>0
category                         
food          3        1        2
school        3        2        1
sport         1        0        1

說明：

首先比較不等於Series.ne （ != ）的depth列
groupby柱category與SeriesGroupBy.value_counts
通過unstack重塑
通過字典Rename列
通過assign創建新的total列
對於列的自定義順序，請添加reindex

編輯：

cat_df = pd.DataFrame({'category': ['food', 'food', 'sport', 'food', 'school', 'school', 'school'], 'depth': [0.0, 1.0, 1.0, 3.0, 0.0, 0.0, 1.0], 'num_of_likes': [10, 10, 10, 20, 20, 20, 20]})

print (cat_df)
  category  depth  num_of_likes
0     food    0.0            10
1     food    1.0            10
2    sport    1.0            10
3     food    3.0            20
4   school    0.0            20
5   school    0.0            20
6   school    1.0            20

df = (cat_df['depth'].ne(0)
                     .groupby([cat_df['num_of_likes'], cat_df['category']])
                     .value_counts()
                     .unstack(fill_value=0)
                     .rename(columns={0:'depth=0', 1:'depth>0'})
                     .assign(total=lambda x: x.sum(axis=1))
                     .reindex(columns=['total','depth=0','depth>0'])
                     .reset_index()
                     .rename_axis(None, axis=1)
)

print (df)
   num_of_likes category  total  depth=0  depth>0
0            10     food      2        1        1
1            10    sport      1        0        1
2            20     food      1        0        1
3            20   school      3        2        1

編輯1：

s = cat_df.groupby('category')['num_of_likes'].sum()
print (s)
category
food      40
school    60
sport     10
Name: num_of_likes, dtype: int64

df = (cat_df['depth'].ne(0)
                     .groupby(cat_df['category'])
                     .value_counts()
                     .unstack(fill_value=0)
                     .rename(columns={0:'depth=0', 1:'depth>0'})
                     .assign(total=lambda x: x.sum(axis=1))
                     .reindex(columns=['total','depth=0','depth>0'])
                     .reset_index()
                     .rename_axis(None, axis=1)
                     .assign(num_of_likes=lambda x: x['category'].map(s))
)
print (df)
  category  total  depth=0  depth>0  num_of_likes
0     food      3        1        2            40
1   school      3        2        1            60
2    sport      1        0        1            10

Answer 2

這是使用pandas.concat一種方式：

total = df.groupby('category').size()
zero = df[df.depth == 0].groupby('category').size()
nonzero = df[df.depth > 0].groupby('category').size()

res = pd.concat([total, zero, nonzero], axis=1)\
        .rename(columns={0: 'total', 1: 'zero', 2: 'nonzero'})\
        .fillna(0).astype(int)

print(res)

#         total  zero   nonzero
# food        3     1         2
# school      3     2         1
# sport       1     0         1

Answer 3

我將如何做crosstab

pd.crosstab(df.category,df.depth.ne(0),margins=True).iloc[:-1,:]
Out[618]: 
depth     False  True  All
category                  
food          1     2    3
school        2     1    3
sport         0     1    1

如果需要名稱添加重命名

pd.crosstab(df.category,df.depth.ne(0),margins=True).iloc[:-1,:].rename(columns={True:'depth>0',False:'depth=0'})

計算帶有條件的熊貓數據框中出現的總數

問題描述

3 個解決方案

解決方案1
3 已采納 2018-04-17 13:02:53

解決方案2
2 2018-04-17 13:04:58

解決方案3
2 2018-04-17 13:48:56

計算帶有條件的熊貓數據框中出現的總數

問題描述

3 個解決方案

解決方案1 3 已采納 2018-04-17 13:02:53

解決方案2 2 2018-04-17 13:04:58

解決方案3 2 2018-04-17 13:48:56

解決方案1
3 已采納 2018-04-17 13:02:53

解決方案2
2 2018-04-17 13:04:58

解決方案3
2 2018-04-17 13:48:56