[英]counting unique elements in lists
I have a dataframe containing one column of lists.我有一个 dataframe 包含一列列表。
names unique_values
[B-PER,I-PER,I-PER,B-PER] 2
[I-PER,N-PER,B-PER,I-PER,A-PER] 4
[B-PER,A-PER,I-PER] 3
[B-PER, A-PER,A-PER,A-PER] 2
I have to count each distinct value in a column of lists and If value appears more than once count it as one.我必须计算一列列表中的每个不同值,如果值出现不止一次,则将其计为一个。 How can I achieve it我怎样才能实现它
Thanks谢谢
Combine explode
with nunique
结合explode
与nunique
df["unique_values"] = df.names.explode().groupby(level = 0).nunique()
You can use the inbulit set
data type to do this -您可以使用 inbulit set
数据类型来执行此操作 -
df['unique_values'] = df['names'].apply(lambda a : len(set(a)))
This works as sets do not allow any duplicate elements in their construction so when you convert a list to a set it strips all duplicate elements and all you need to do is get the length of the resultant set.这是因为集合不允许在其构造中存在任何重复元素,因此当您将列表转换为集合时,它会去除所有重复元素,您需要做的就是获取结果集合的长度。
to ignore NaN values in a list you can do the following -要忽略列表中的 NaN 值,您可以执行以下操作 -
df['unique_values'] = df['names'].apply(lambda a : len([x for x in set(a) if str(x) != 'nan']))
Try:尝试:
df["unique_values"] = df.names.explode().groupby(level = 0).unique().str.len()
Output Output
df
names unique_values
0 [B-PER, I-PER, I-PER, B-PER] 2
1 [I-PER, N-PER, B-PER, I-PER, A-PER] 4
2 [B-PER, A-PER, I-PER] 3
3 [B-PER, A-PER, A-PER, A-PER] 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.