![](/img/trans.png)
[英]how to remove all elements in a python list which are less than a particular number of charecters
[英]Remove all elements which occur in less than 1% and more than 60% of the list
如果我有這個字符串列表:
['fsuy3,fsddj4,fsdg3,hfdh6,gfdgd6,gfdf5',
'fsuy3,fsuy3,fdfs4,sdgsdj4,fhfh4,sds22,hhgj6,xfsd4a,asr3']
(大名單)
如何刪除少於1%和超過60%字符串的所有單詞?
你可以使用collections.Counter
:
counts = Counter(mylist)
接着:
newlist = [s for s in mylist if 0.01 < counts[s]/len(mylist) < 0.60]
(在Python 2.x中使用float(counts[s])/len(mylist)
)
如果你在談論逗號分隔的單詞,那么你可以使用類似的方法:
words = [l.split(',') for l in mylist]
counts = Counter(word for l in words for word in l)
newlist = [[s for s in l if 0.01 < counts[s]/len(mylist) < 0.60] for l in words]
直截了當的解決方案
occurrences = dict()
for word in words:
if word not in occurrences:
occurrences[word] = 1
else:
occurrences[word] += 1
result = [word for word in words 0.01 <= occurrences[word] /len(words) <= 0.6]
我猜你想要這個:
from collections import Counter,Set
# break up by ',' and remove duplicate words on each line
st = [set(s.split(',')) for s in mylist]
# Count all the words
count = Counter([word for line in st for word in line])
# Work out which words are allowed
allowed = [s for s in count if 0.01 < counts[s]/len(mylist) < 0.60]
#For each row in the original list. If the word is allowed then keep it
result = [[w for w in s.split(',') if w in allowed] for s in mylist]
print result
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.