I am generating a txt file based on the TD IDF calculation for each words.
I am using this code to write the file
w_writer = open("tf_idf_vectors_stops_2.txt", "w")
for x in xrange(0, len(listPatient)):
patientId = listPatient[x] #List for patientid
for words in tdDict_final[patientId]:
w_writer.write(patent + "," + str(multiListTokens.index(words[0])) + "," + str(words[2]))
w_writer.write("\n")
w_writer.close()
listPatient is a list consisted of sorted ID.
listPatient = ['001', '002', '003', '004']
tdDict_final is a dictionary consists of ID as a key and words and words value
In the code we called words[0] for word and word[2] value because word[1] is going to be ":", the format of tdDict_final is shown as this.
{'001': [('dog', ':', '0.2534879), ('cat', ':', '0.0133487)],
'002': [('floor', ':', '0.047589'), ('board'), ':' ('0.099345)],
'003': [('key'), ':', '0.04993)],
'004': [('thanks', ':', '0.01479')]}
tdDict contains all the patients in listPatient
multilistTokens is a list contain many distinct vocabularies (tokens)
multilistTokens consists of all the possible dictinct vocabularies found in tdDict.
The problem is, my code above is extremely slow and sluggish when writing it out.
Is there anyway I can improve the efficiency of writing into a txt file using the code above?
Thank you very much
with open("tf_idf_vectors_stops_2.txt", "w") as w_writer:
for patientId in listPatient:
for words in tdDict_final[patientId]:
w_writer.write("%s,%s,%s\n" % (patent, str(multiListTokens.index(words[0])), str(words[2])))
1st | you should use a with
statement instead of opening the file and then manually closing the file. The with
statement is a python context manager , which means that it will open the file as w_writer
and then when you are finished it will close it automatically.
2nd | there is no need to use the xrange
above, because apart from where you take patientId
from listPatient
( patientId = listPatient[x]
) you are not using the x
. You can extract patientId
directly from listPatient
and use it from there.
3rd | using the +
method to add strings together is notoriously slow in python. The most efficient way to concatenate (join) strings in python is by using the join method or by using in-place delimiters (as I have). Also you should not be calling write twice as you can incorporate the "\\n"
in the 1st write statement.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.