I have a big tab-deltimited csv file: first tab is for emotion words, second for eight basic emotions, plus the values positive
and negative
, and the last tab is the boolean value if the the second tab-value fits the first.
A snippet from the file:
snarl anger 1
snarl anticipation 0
snarl disgust 1
snarl fear 0
snarl joy 0
snarl negative 1
snarl positive 0
snarl sadness 0
snarl surprise 0
snarl trust 0
snarling anger 1
snarling anticipation 0
snarling disgust 0
snarling fear 0
snarling joy 0
snarling negative 1
snarling positive 0
snarling sadness 0
snarling surprise 0
snarling trust 0
My code so far to do this:
import csv
from pprint import pprint
from itertools import groupby
l = list(csv.reader(open('NRC-Emotion-Lexicon-Wordlevel-v0.92.txt')))
f = lambda x: x[-1] #manipulate number to see different results
{k:[tuple(x[0:1]) for x in v] for k,v in groupby(sorted(l[1:], key=f), f)}
pprint(l)
My current output is not that good looking:
['asylum\tanger\t0'],
['asylum\tanticipation\t0'],
['asylum\tdisgust\t0'],
['asylum\tfear\t1'],
['asylum\tjoy\t0'],
['asylum\tnegative\t1'],
['asylum\tpositive\t0'],
['asylum\tsadness\t0'],
['asylum\tsurprise\t0'],
['asylum\ttrust\t0'],
My question is: How do I create a dictionary of lists with one unique key for each of the repeated emotion words (reducing 10 repetitions to 1, each) and only include the second tab elements in the list of that dictionary key, when they have the boolean value of 1?
Any kind of help would be appreciated!
EDIT: as one of the replies stated, an example of the desired output would look like this:
{'snarl': ['anger', 'disgust'], #included in list due to having '1', ignoring 'positve' and 'negative'
'snarling': ['anger'], #etc...
}
EDIT 2:
The first and the last lines of the file are empty, as I mentioned in the answers per comments.
This is one approach. Using defaultdict
Ex:
import csv
from collections import defaultdict
d = defaultdict(list)
with open(filename) as infile:
reader = csv.reader(infile, delimiter="\t")
for row in reader:
if row[2] == '1':
d[row[0]].append(row[1])
print(d)
Edit as per comment
from collections import defaultdict
d = defaultdict(list)
with open(filename) as infile:
for row in infile:
if row.strip():
val = row.split()
if val[2] == '1':
d[val[0]].append(val[1])
print(d)
You can use collections.defaultdict
and update a dictionary of lists while iterating a csv.reader
object.
Your criterion is added in an if
statement, taking care to convert the number to an integer via int
.
import csv
from collections import defaultdict
from io import StringIO
x = StringIO("""snarl anger 1
snarl anticipation 0
...
snarling surprise 0
snarling trust 0""")
d = defaultdict(list)
# replace x with open('file.csv', 'r')
with x as fin:
reader = filter(None, csv.reader(x, delimiter=' ', skipinitialspace=True))
# or, reader = filter(None, csv.reader(x, delimiter='\t'))
for word, emotion, num in reader:
if int(num):
d[word].append(emotion)
Result:
print(d)
defaultdict(list,
{'snarl': ['anger', 'disgust', 'negative'],
'snarling': ['anger', 'negative']})
I guess you were almost close to the answer. But when you invoked csv.reader, you didn't specify delimiter (which means it defaulted to comma as delimiter)
>>> from itertools import groupby
>>> l = map(str.split, open('NRC-Emotion-Lexicon-Wordlevel-v0.92.txt').readlines())
>>> f = lambda x: x[1]
>>> {k:set(e[0] for e in v) for k,v in groupby(sorted(filter(bool, l), key=f), f)}
{'anger': {'snarling', 'snarl'}, 'anticipation': {'snarling', 'snarl'}, 'disgust': {'snarling', 'snarl'}, 'fear': {'snarling', 'snarl'}, 'joy': {'snarling', 'snarl'}, 'negative': {'snarling', 'snarl'}, 'positive': {'snarling', 'snarl'}, 'sadness': {'snarling', 'snarl'}, 'surprise': {'snarling', 'snarl'}, 'trust': {'snarling', 'snarl'}}
Here's how I would do it. You could also use collections.defaultdict
if you wished (instead of setdefault
):
import csv
with open('NRC-Emotion-Lexicon-Wordlevel-v0.92.txt', newline='') as file:
l = [row[:-1] for row in csv.reader(file, delimiter='\t')
if row and row[-1] == '1'] # Not empty and last elem is true.
d = {}
for e_word, basic in l:
d.setdefault(e_word, []).append(basic)
print('dictionary d:\n', d)
Output:
dictionary d:
{'snarl': ['anger', 'disgust', 'negative'], 'snarling': ['anger', 'negative']}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.