[英]Count distinct values in Python list
I have a datagram like below 我有一个如下的数据报
lable unigrams
ham [ive, searching, right, word, thank, breather, i, promise, wont]
spam [free, entry, 2, wkly, comp, win, fa, cup, final, tkts, 21st, may]
I want to count the distinct/ unique ham unigrams and distinct spam unigrams. 我想计算不同的/独特的火腿字母和垃圾邮件的字母。
I can count the distinct values in a column using df.unigrams.nunique()
. 我可以使用
df.unigrams.nunique()
计算一列中的不同值。 I can count the number of occurrences of a given unigram in ham using unigramCount = unigramCorpus.loc["ham", "unigrams"].count('ive')
我可以使用
unigramCount = unigramCorpus.loc["ham", "unigrams"].count('ive')
给定的unigramCount = unigramCorpus.loc["ham", "unigrams"].count('ive')
在火腿中的出现unigramCount = unigramCorpus.loc["ham", "unigrams"].count('ive')
But how can I count the number of distinct values in a given list? 但是,如何计算给定列表中不同值的数量? Ex:
["ham", "spam"]
例如:
["ham", "spam"]
Expected output: ham = 9 spam = 12 预期输出:火腿= 9垃圾邮件= 12
You need: 你需要:
df.assign(count = df.unigrams.apply(lambda x: len(set(x))))
label unigrams count
0 ham [ive, searching, right, word, thank, breather,...] 9
1 spam [free, entry, 2, wkly, comp, win, fa, cup, fin...] 12
Using np.unique
使用
np.unique
(counts only distinct words in every list of unigrams, so duplicates will be ignored): (在每个字母组合列表中仅计数不同的词,因此重复项将被忽略):
df['counts'] = df.apply(lambda x: len(np.unique(x['unigrams'])), axis=1)
print(df)
> label unigrams counts
0 ham [ive, searching, right, word, thank, breather,... 9
1 spam [free, entry, 2, wkly, comp, win, fa, cup, fin... 12
unigramCount = len(set(eval(unigramCorpus.loc [“ ham”,“ unigrams]])))
您的问题不是很清楚,但这可能有用:
df['count'] = df['unigrams'].map(lambda x: len(x))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.