[英]Counting unique values in a column in pandas dataframe like in Qlik?
If I have a table like this:如果我有一张这样的桌子:
df = pd.DataFrame({
'hID': [101, 102, 103, 101, 102, 104, 105, 101],
'dID': [10, 11, 12, 10, 11, 10, 12, 10],
'uID': ['James', 'Henry', 'Abe', 'James', 'Henry', 'Brian', 'Claude', 'James'],
'mID': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'C']
})
I can do count(distinct hID)
in Qlik to come up with count of 5 for unique hID.我可以在
count(distinct hID)
执行count(distinct hID)
以得出唯一 hID 的计数为 5。 How do I do that in python using a pandas dataframe?我如何使用 Pandas 数据框在 python 中做到这一点? Or maybe a numpy array?
或者也许是一个 numpy 数组? Similarly, if were to do
count(hID)
I will get 8 in Qlik.同样,如果要执行
count(hID)
我将在count(hID)
得到 8。 What is the equivalent way to do it in pandas?在熊猫中这样做的等效方法是什么?
Count distinct values, use nunique
:计算不同的值,使用
nunique
:
df['hID'].nunique()
5
Count only non-null values, use count
:只计算非空值,使用
count
:
df['hID'].count()
8
Count total values including null values, use the size
attribute:计算包括空值在内的总值,使用
size
属性:
df['hID'].size
8
Use boolean indexing:使用布尔索引:
df.loc[df['mID']=='A','hID'].agg(['nunique','count','size'])
OR using query
:或使用
query
:
df.query('mID == "A"')['hID'].agg(['nunique','count','size'])
Output:输出:
nunique 5
count 5
size 5
Name: hID, dtype: int64
If I assume data is the name of your dataframe, you can do :如果我假设 data 是您的数据框的名称,您可以执行以下操作:
data['race'].value_counts()
this will show you the distinct element and their number of occurence.这将向您显示不同的元素及其出现次数。
Or get the number of unique values for each column:或者获取每列的唯一值的数量:
df.nunique()
dID 3
hID 5
mID 3
uID 5
dtype: int64
New in pandas 0.20.0
pd.DataFrame.agg
pandas 0.20.0
新pandas 0.20.0
pd.DataFrame.agg
df.agg(['count', 'size', 'nunique'])
dID hID mID uID
count 8 8 8 8
size 8 8 8 8
nunique 3 5 3 5
You've always been able to do an agg
within a groupby
.你总是能够在
groupby
做一个agg
。 I used stack
at the end because I like the presentation better.我最后使用了
stack
,因为我更喜欢演示文稿。
df.groupby('mID').agg(['count', 'size', 'nunique']).stack()
dID hID uID
mID
A count 5 5 5
size 5 5 5
nunique 3 5 5
B count 2 2 2
size 2 2 2
nunique 2 2 2
C count 1 1 1
size 1 1 1
nunique 1 1 1
要计算列中的唯一值,例如hID
df
hID
,请使用:
len(df.hID.unique())
I was looking for something similar and I found another way you may help you我正在寻找类似的东西,我找到了另一种可以帮助你的方法
def count_nulls(s):
return s.size - s.count()
def unique_nan(s):
return s.nunique(dropna=False)
from scipy.stats import mode
agg_func_custom_count = {
'embark_town': ['count', 'nunique', 'size', unique_nan, count_nulls, set]
}
df.groupby(['deck']).agg(agg_func_custom_count)
you can use unique property by using len function您可以使用 len 函数使用唯一属性
len(df['hID'].unique()) 5
len(df['hID'].unique()) 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.