def unpack_dict(matrix, map_index_to_word):
table = sorted(map_index_to_word, key=map_index_to_word.get)
data = matrix.data
indices = matrix.indices
indptr = matrix.indptr
num_doc = matrix.shape[0]
return [{k:v for k,v in zip([table[word_id] for word_id in
indices[indptr[i]:indptr[i+1]] ],
data[indptr[i]:indptr[i+1]].tolist())} \
for i in range(num_doc) ]
wiki['tf_idf'] = unpack_dict(tf_idf, map_index_to_word)
map_index_to_word is dictionary of word:index for few thousand words. tf_idf is TFIDF sparse vector DataFrame wiki is displayed in screenshot here
[{k: v for k, v in zip([table[word_id] for word_id in indices[indptr[i]:indptr[i + 1]]],data[indptr[i]:indptr[i + 1]].tolist())} for i in range(num_doc)]
is same as :
final_list = []
for i in range(num_doc):
new_list = []
for word_id in indices[indptr[i]:indptr[i + 1]]:
new_list.append(table[word_id])
new_dict = {}
for k, v in zip(new_list, data[indptr[i]:indptr[i + 1]].tolist()):
new_dict[k] = v
final_list.append(new_dict)
This?
[{k:v for k,v in zip([table[word_id] for word_id in
indices[indptr[i]:indptr[i+1]] ],
data[indptr[i]:indptr[i+1]].tolist())} \
for i in range(num_doc) ]
The outer comprehension is
[... for i in range(num_doc) ]
Just a simple loop num_doc
times.
Inside that is a dictionary comprehension.
{k:v for k,v in zip()}
The zip
takes the k
key from:
[table[word_id] for word_id in indices[indptr[i]:indptr[i+1]] ]
and v
value from:
data[indptr[i]:indptr[i+1]].tolist()
So the i
, outer variable creates the slicing range, indptr[i]:indptr[i+1]
.
So it's making a list of dictionaries. The dictionary keys are from table[word_id]
where word_id
is found in a range of indices
, and the value is the corresponding range of data
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.