简体   繁体   English

Mongodb Import非常慢,如何提高性能?

[英]Mongodb Import is very slow, how can I improve the performance?

I was wondering how I can improve the import performance of data in Mongodb. 我想知道如何提高Mongodb中数据的导入性能。 I have 17700 txt files and to import them I have to first turn them into a dictionary and then import them into Mongo, but using the loop for the process is really too slow, any suggestions? 我有17700个txt文件,要导入它们,我必须先将它们转换成字典,然后再将它们导入Mongo,但是使用循环进行处理确实太慢了,有什么建议吗? Thank you This is my code: 谢谢这是我的代码:

from bson.objectid import ObjectId
   def txt_dict(x):
       d = {}
       with open(x,'r') as inf:
       conta=0
       for line in inf:
          if (conta == 0):
            movie_id = line.replace(":","")
            conta = conta+1   
          else:
              d['user_id'] = line.split(sep = ',')[0]
              d['rating'] = int(line.split(sep = ',')[1])
              d['date'] = line.split(sep = ',')[2]
              d['_id'] = ObjectId()
            d['movie_id'] = movie_id
            collection.insert(d)
 import os
directory = 
r"/Users/lorenzofamiglini/Desktop/Data_Science/training_set"
for filename in os.listdir(directory):
    if filename.endswith('.txt'):
       txt_dict((directory+"/"+filename))
    #print (str(directory+"/"+filename))

Two ways to improve performance. 两种提高性能的方法。

  1. Use insert_many to insert records in bulk (I recommend batches of 1000) 使用insert_many批量插入记录(我建议批量为1000)
  2. Process files in parallel either by running multiple instances of your program in parallel or by using multiprocessing . 通过并行运行程序的多个实例或使用多重处理来并行处理文件

Any database is constrained by disk write speed on a single insert but is very efficient at batching together multiple insert operations. 任何数据库都受单个插入上磁盘写入速度的限制,但在将多个插入操作分批处理时非常有效。 By parallelizing your loading you can saturate the disk. 通过并行加载,可以使磁盘饱和。

In short it will run faster. 简而言之,它将运行得更快。 After that you are into parallelizing your writes with multiple disk drives and using SSDs. 之后,您将使用多个磁盘驱动器和SSD使用并行化写入。

With MongoDB Atlas you can turn up the IOPS rate (Input Output Operations) during data loads and dial it down afterwards. 使用MongoDB Atlas,您可以在数据加载期间调高IOPS速率(输入输出操作),然后将其调低。 Always an option if you are in the cloud. 如果您在云中,则始终是一个选择。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM