计算Python列表中的不同值

Question

I have a datagram like below 我有一个如下的数据报

lable                          unigrams                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
ham    [ive, searching, right, word, thank, breather, i, promise, wont] 
spam   [free, entry, 2, wkly, comp, win, fa, cup, final, tkts, 21st, may]

I want to count the distinct/ unique ham unigrams and distinct spam unigrams. 我想计算不同的/独特的火腿字母和垃圾邮件的字母。

I can count the distinct values in a column using df.unigrams.nunique() . 我可以使用df.unigrams.nunique()计算一列中的不同值。 I can count the number of occurrences of a given unigram in ham using unigramCount = unigramCorpus.loc["ham", "unigrams"].count('ive') 我可以使用unigramCount = unigramCorpus.loc["ham", "unigrams"].count('ive')给定的unigramCount = unigramCorpus.loc["ham", "unigrams"].count('ive')在火腿中的出现unigramCount = unigramCorpus.loc["ham", "unigrams"].count('ive')

But how can I count the number of distinct values in a given list? 但是，如何计算给定列表中不同值的数量？ Ex: ["ham", "spam"] 例如： ["ham", "spam"]

Expected output: ham = 9 spam = 12 预期输出：火腿= 9垃圾邮件= 12

Answer 1

You need: 你需要：

df.assign(count = df.unigrams.apply(lambda x: len(set(x))))

   label    unigrams                                          count
0   ham     [ive, searching, right, word, thank, breather,...]  9
1   spam    [free, entry, 2, wkly, comp, win, fa, cup, fin...]  12

Answer 2

Using np.unique 使用np.unique
(counts only distinct words in every list of unigrams, so duplicates will be ignored): （在每个字母组合列表中仅计数不同的词，因此重复项将被忽略）：

df['counts'] = df.apply(lambda x: len(np.unique(x['unigrams'])), axis=1) 
print(df)

>   label   unigrams    counts
0   ham [ive, searching, right, word, thank, breather,...   9
1   spam    [free, entry, 2, wkly, comp, win, fa, cup, fin...   12

Answer 3

unigramCount = len（set（eval（unigramCorpus.loc [“ ham”，“ unigrams]]）））

Answer 4

您的问题不是很清楚，但这可能有用：

df['count'] = df['unigrams'].map(lambda x: len(x))

计算Python列表中的不同值

问题描述

4 个解决方案

解决方案1
2 已采纳 2018-08-06 19:06:51

解决方案2
1 2018-08-06 19:14:08

解决方案3
0 2018-08-06 19:08:45

解决方案4
0 2018-08-06 19:09:11

计算Python列表中的不同值

问题描述

4 个解决方案

解决方案1 2 已采纳 2018-08-06 19:06:51

解决方案2 1 2018-08-06 19:14:08

解决方案3 0 2018-08-06 19:08:45

解决方案4 0 2018-08-06 19:09:11

解决方案1
2 已采纳 2018-08-06 19:06:51

解决方案2
1 2018-08-06 19:14:08

解决方案3
0 2018-08-06 19:08:45

解决方案4
0 2018-08-06 19:09:11