![](/img/trans.png)
[英]Count the number of occurrences of a specifc text in a column after a condition in pandas
[英]Count total number of occurrences in pandas dataframe with a condition
我有這個數據框:
cat_df.head()
category depth
0 food 0.0
1 food 1.0
2 sport 1.0
3 food 3.0
4 school 0.0
5 school 0.0
6 school 1.0
...
depth = 0
表示根發布, depth > 0
是注釋。
對於每個類別,我想計算根發布的數量( depth=0
)和評論的數量( depth>0
)。
我使用value_counts()
來計算唯一值:
cat_df['category'].value_counts().head(15)
category total number
food 44062
sport 38004
school 11080
life 8810
...
我以為我可以將['depth'] == 0
作為條件放入數據框,但這沒有用:
cat_df[cat_df['depth'] == 0].value_counts().head(5)
如何獲得深度= 0和深度> 0的總發生次數?
我想將其放在這樣的表中:
category | total number | depth=0 | depth>0
...
您只能使用一個groupby
來提高性能:
df = (cat_df['depth'].ne(0)
.groupby(cat_df['category'])
.value_counts()
.unstack(fill_value=0)
.rename(columns={0:'depth=0', 1:'depth>0'})
.assign(total=lambda x: x.sum(axis=1))
.reindex(columns=['total','depth=0','depth>0']))
print (df)
depth total depth=0 depth>0
category
food 3 1 2
school 3 2 1
sport 1 0 1
說明 :
Series.ne
( !=
)的depth
列 groupby
柱category
與SeriesGroupBy.value_counts
unstack
重塑 Rename
列 assign
創建新的total
列 reindex
編輯:
cat_df = pd.DataFrame({'category': ['food', 'food', 'sport', 'food', 'school', 'school', 'school'], 'depth': [0.0, 1.0, 1.0, 3.0, 0.0, 0.0, 1.0], 'num_of_likes': [10, 10, 10, 20, 20, 20, 20]})
print (cat_df)
category depth num_of_likes
0 food 0.0 10
1 food 1.0 10
2 sport 1.0 10
3 food 3.0 20
4 school 0.0 20
5 school 0.0 20
6 school 1.0 20
df = (cat_df['depth'].ne(0)
.groupby([cat_df['num_of_likes'], cat_df['category']])
.value_counts()
.unstack(fill_value=0)
.rename(columns={0:'depth=0', 1:'depth>0'})
.assign(total=lambda x: x.sum(axis=1))
.reindex(columns=['total','depth=0','depth>0'])
.reset_index()
.rename_axis(None, axis=1)
)
print (df)
num_of_likes category total depth=0 depth>0
0 10 food 2 1 1
1 10 sport 1 0 1
2 20 food 1 0 1
3 20 school 3 2 1
編輯1:
s = cat_df.groupby('category')['num_of_likes'].sum()
print (s)
category
food 40
school 60
sport 10
Name: num_of_likes, dtype: int64
df = (cat_df['depth'].ne(0)
.groupby(cat_df['category'])
.value_counts()
.unstack(fill_value=0)
.rename(columns={0:'depth=0', 1:'depth>0'})
.assign(total=lambda x: x.sum(axis=1))
.reindex(columns=['total','depth=0','depth>0'])
.reset_index()
.rename_axis(None, axis=1)
.assign(num_of_likes=lambda x: x['category'].map(s))
)
print (df)
category total depth=0 depth>0 num_of_likes
0 food 3 1 2 40
1 school 3 2 1 60
2 sport 1 0 1 10
這是使用pandas.concat
一種方式:
total = df.groupby('category').size()
zero = df[df.depth == 0].groupby('category').size()
nonzero = df[df.depth > 0].groupby('category').size()
res = pd.concat([total, zero, nonzero], axis=1)\
.rename(columns={0: 'total', 1: 'zero', 2: 'nonzero'})\
.fillna(0).astype(int)
print(res)
# total zero nonzero
# food 3 1 2
# school 3 2 1
# sport 1 0 1
我將如何做crosstab
pd.crosstab(df.category,df.depth.ne(0),margins=True).iloc[:-1,:]
Out[618]:
depth False True All
category
food 1 2 3
school 2 1 3
sport 0 1 1
如果需要名稱添加重命名
pd.crosstab(df.category,df.depth.ne(0),margins=True).iloc[:-1,:].rename(columns={True:'depth>0',False:'depth=0'})
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.