I have a CVS with one column and 4000 rows i want to make a script that can print each unique word and its percentage that is on that CSV
Example:
Trojan
Trojan
redirects
Exploits
Trojan
Trojan: 60% Redirects: 20% Exploits 20%
What is the easy/simple way to do this?
here is a image with the data i have
import csv
myDict = {}
with open('export.csv', 'rb') as csvfile:
for word in csvfile:
if word in myDict:
myDict[word] += 1
else:
myDict[word] = 1
for word in myDict:
print word, float(myDict[word])/len(csvfile)
You can use set to get all unique values and count to get the number of occurrences. Dividing by the length of the list with text yields the percentage:
text = ['a', 'a', 'b', 'c']
[(i, text.count(i) * 100. / len(text)) for i in set(text)]
resulting in:
[('a', 50.0), ('b', 25.0), ('c', 25.0)]
You can use dictionary as below:
import csv
myDict = {}
row_number = 0
with open('some.csv', 'rb') as f:
reader = csv.reader(f, delimiter=' ')
for row in reader:
row_number +=1
if row[0] in myDict:
myDict[row[0]] += 1
else:
myDict[row[0]] = 1
for word in myDict:
print word, float(myDict[word])/row_number
Works as below:
>>> ================================ RESTART ================================
>>>
Trojan 0.6
Exploits 0.2
redirects 0.2
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.