简体   繁体   English

计算列表中的唯一元素

[英]counting unique elements in lists

I have a dataframe containing one column of lists.我有一个 dataframe 包含一列列表。

names                                       unique_values
[B-PER,I-PER,I-PER,B-PER]                        2
[I-PER,N-PER,B-PER,I-PER,A-PER]                  4
[B-PER,A-PER,I-PER]                              3
[B-PER, A-PER,A-PER,A-PER]                       2

I have to count each distinct value in a column of lists and If value appears more than once count it as one.我必须计算一列列表中的每个不同值,如果值出现不止一次,则将其计为一个。 How can I achieve it我怎样才能实现它

Thanks谢谢

Combine explode with nunique结合explodenunique

df["unique_values"] = df.names.explode().groupby(level = 0).nunique()

You can use the inbulit set data type to do this -您可以使用 inbulit set数据类型来执行此操作 -

df['unique_values'] = df['names'].apply(lambda a : len(set(a)))

This works as sets do not allow any duplicate elements in their construction so when you convert a list to a set it strips all duplicate elements and all you need to do is get the length of the resultant set.这是因为集合不允许在其构造中存在任何重复元素,因此当您将列表转换为集合时,它会去除所有重复元素,您需要做的就是获取结果集合的长度。

to ignore NaN values in a list you can do the following -要忽略列表中的 NaN 值,您可以执行以下操作 -

df['unique_values'] = df['names'].apply(lambda a : len([x for x in set(a) if str(x) != 'nan'])) 

Try:尝试:

df["unique_values"] = df.names.explode().groupby(level = 0).unique().str.len()

Output Output

df
                                 names  unique_values
0         [B-PER, I-PER, I-PER, B-PER]              2
1  [I-PER, N-PER, B-PER, I-PER, A-PER]              4
2                [B-PER, A-PER, I-PER]              3
3         [B-PER, A-PER, A-PER, A-PER]              2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM