[英]Pandas dataframe find distinct value count for each group in other columns
I have a Pandas
dataframe
a sample input of which looks like below:我有一个Pandas
dataframe
示例输入,如下所示:
vendor filename language score text
Vendor 1 File 1 chinese 0.67717278 text1
Vendor 2 File 1 chinese 0.644506991 text2
Vendor 1 File 2 chinese 0.67717278 text1
Vendor 2 File 1 chinese 0.644506991 text3
Vendor 1 File 2 Arabic 0.999999523 text3
Vendor 1 File 1 Arabic 0.756420255 text2
Vendor 2 File 3 Arabic 0.999999523 text4
Vendor 1 File 1 Arabic 0.756420255 text4
What I am trying to do is for each language and within that language for each file, count the distinct number of values in text
column where score
is greater than 0.5
.我要做的是针对每种语言以及在该语言中为每个文件计算score
大于0.5
的text
列中不同的值数。 So my ideal output for above sample input should be:所以我对上述示例输入的理想 output 应该是:
Chinese File 1 3
File 2 1
Arabic File 1 2
File 2 1
File 3 1
Note that File 1
and File 2
are both used by Chinese
and Arabic
but I want to count their unique text values separately for each language.请注意, File 1
和File 2
都被Chinese
和Arabic
使用,但我想分别计算每种语言的唯一文本值。
I tried to use pandas
groupby
and unique
function in below code but this is not working as it throws error as 'DataFrameGroupBy' object has no attribute 'unique'
:我尝试在下面的代码中使用pandas
groupby
和unique
function 但这不起作用,因为它会抛出错误,因为它会引发错误,因为它会因为'DataFrameGroupBy' object has no attribute 'unique'
:
df_1 = df[df["score"] > 0.5].groupby(['language', 'filename']).unique().size()
print("Number of unique text greater than 0.5 score:{}".format(df_1))
What is the most ideal way to resolve this issue achieve the intended outcome?解决此问题以达到预期结果的最理想方法是什么?
Use DataFrameGroupBy.nunique
with specify column text
for count number of unique values:使用DataFrameGroupBy.nunique
并指定列text
来计算唯一值的数量:
df_1 = df[df["score"] > 0.5].groupby(['language', 'filename'], sort=False)['text'].nunique()
print("Number of unique text greater than 0.5 score:\n{}".format(df_1))
Number of unique text greater than 0.5 score:
language filename
chinese File 1 3
File 2 1
Arabic File 2 1
File 1 2
File 3 1
Name: text, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.