I was wondering how I can improve the import performance of data in Mongodb. I have 17700 txt files and to import them I have to first turn them into a dictionary and then import them into Mongo, but using the loop for the process is really too slow, any suggestions? Thank you This is my code:
from bson.objectid import ObjectId
def txt_dict(x):
d = {}
with open(x,'r') as inf:
conta=0
for line in inf:
if (conta == 0):
movie_id = line.replace(":","")
conta = conta+1
else:
d['user_id'] = line.split(sep = ',')[0]
d['rating'] = int(line.split(sep = ',')[1])
d['date'] = line.split(sep = ',')[2]
d['_id'] = ObjectId()
d['movie_id'] = movie_id
collection.insert(d)
import os
directory =
r"/Users/lorenzofamiglini/Desktop/Data_Science/training_set"
for filename in os.listdir(directory):
if filename.endswith('.txt'):
txt_dict((directory+"/"+filename))
#print (str(directory+"/"+filename))
Two ways to improve performance.
Any database is constrained by disk write speed on a single insert but is very efficient at batching together multiple insert operations. By parallelizing your loading you can saturate the disk.
In short it will run faster. After that you are into parallelizing your writes with multiple disk drives and using SSDs.
With MongoDB Atlas you can turn up the IOPS rate (Input Output Operations) during data loads and dial it down afterwards. Always an option if you are in the cloud.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.