刪除列表中少於1％和60％以上的所有元素

Question

如果我有這個字符串列表：

['fsuy3,fsddj4,fsdg3,hfdh6,gfdgd6,gfdf5',
'fsuy3,fsuy3,fdfs4,sdgsdj4,fhfh4,sds22,hhgj6,xfsd4a,asr3']

（大名單）

如何刪除少於1％和超過60％字符串的所有單詞？

Answer 1

你可以使用collections.Counter ：

counts = Counter(mylist)

接着：

newlist = [s for s in mylist if 0.01 < counts[s]/len(mylist) < 0.60]

（在Python 2.x中使用float(counts[s])/len(mylist) ）

如果你在談論逗號分隔的單詞，那么你可以使用類似的方法：

words = [l.split(',') for l in mylist]

counts = Counter(word for l in words for word in l)

newlist = [[s for s in l if 0.01 < counts[s]/len(mylist) < 0.60] for l in words]

Answer 2

直截了當的解決方案

occurrences = dict()
for word in words:
  if word not in occurrences:
     occurrences[word] = 1
  else:
     occurrences[word] += 1

result = [word for word in words 0.01 <= occurrences[word] /len(words) <= 0.6]

Answer 3

我猜你想要這個：

    from collections import Counter,Set

# break up by ',' and remove duplicate words on each line
    st = [set(s.split(',')) for s in mylist]

# Count all the words
    count = Counter([word for line in st for word in line])

# Work out which words are allowed
    allowed = [s for s in count if 0.01 < counts[s]/len(mylist) < 0.60]

#For each row in the original list. If the word is allowed then keep it
    result = [[w for w in s.split(',') if w in allowed] for s in mylist]

    print result

刪除列表中少於1％和60％以上的所有元素

問題描述

3 個解決方案

解決方案1
8 2013-08-08 15:36:59

解決方案2
1 2013-08-08 15:37:46

解決方案3
0 2013-08-08 16:06:44

刪除列表中少於1％和60％以上的所有元素

問題描述

3 個解決方案

解決方案1 8 2013-08-08 15:36:59

解決方案2 1 2013-08-08 15:37:46

解決方案3 0 2013-08-08 16:06:44

解決方案1
8 2013-08-08 15:36:59

解決方案2
1 2013-08-08 15:37:46

解決方案3
0 2013-08-08 16:06:44