简体繁体 English

我应该在哪里使用 mongodb 中的分片或运行 mongodb 的多个实例？

[英]Where should I use sharding in mongodb or run multiple instance of mongodb?

原文 2022-01-18 05:22:16 1 1 mongodb/ sharding/ gridfs/ replicaset

Issue问题

I have at least 10 text files(CSV), each reaches to 5GB in size.我至少有 10 个文本文件 (CSV)，每个文件大小达到 5GB。 There is no issue when I import the first text file.导入第一个文本文件时没有问题。 But when I start importing the second text file it shows the Maximum Size Limit ( 16MB ).但是当我开始导入第二个文本文件时，它会显示最大大小限制 ( 16MB )。

My primary purpose for using the database is for searching the customers from the database using customer_id index.我使用数据库的主要目的是使用 customer_id 索引从数据库中搜索客户。

Given Below is the details of One CSV File.下面给出了一个 CSV 文件的详细信息。

Collection Name|Documents|Avg.Document Size|Total Document Size|Num.Indexes| Total Index Size|Properties

Customers|8,874,412|1.8 KB|15.7 GB|3|262.0 MB客户|8,874,412|1.8 KB|15.7 GB|3|262.0 MB

To overcome this MongoDB community were recommending GridFS, but the problem with GridFS is that the data is stored in bytes and its not possible to query for a specific index in the textfile.为了克服这个 MongoDB 社区推荐 GridFS，但 GridFS 的问题是数据以字节存储，并且无法查询文本文件中的特定索引。

I don't know if its possible to query for a specific index in a textfile when using GridFS.我不知道在使用 GridFS 时是否可以查询文本文件中的特定索引。 If some one knows any help is appreciated.如果有人知道任何帮助表示赞赏。

Then the other solution I thought about was creating multiple instance of MonogDB running in different ports to solve the issue.然后我想到的另一个解决方案是创建在不同端口上运行的多个 MonogDB 实例来解决这个问题。 Is this method feasible?这种方法可行吗？

But lot of the tutorial on multiple instance shows how to cerate a replica set.但是很多关于多实例的教程都展示了如何创建一个副本集。 There by storing the same data in the PRIMARY and the SECONDARY.通过在 PRIMARY 和 SECONDARY 中存储相同的数据。
The SECONDARY instances don't allow to write and only allows to read data. SECONDARY 实例不允许写入，只允许读取数据。

Is it possible to create multiple instance of MongoDB without creating replica set and with write and read operations on them?是否可以在不创建副本集并对它们进行读写操作的情况下创建 MongoDB 的多个实例？ If Yes How?如果是，如何？ Can this method overcome the 16MB limit.这种方法能否克服 16MB 的限制。

Second Solution I thought about was creating shards of the collections or simply sharding.我想到的第二种解决方案是创建 collections 的分片或简单的分片。 Can this method overcome the 16MB limit.这种方法能否克服 16MB 的限制。 If yes any help regarding this.如果是的话，对此有任何帮助。

Of the two solutions which is more efficient for searching for data (in terms of speed).在搜索数据（在速度方面）更有效的两种解决方案中。 As I mentioned earlier I just want to search of customers from this database.正如我之前提到的，我只想从这个数据库中搜索客户。

1 个解决方案

The error message shows exactly where the problem is: entry #8437: line 13530, column 627错误消息准确显示了问题所在：条目 #8437：第 13530 行，第 627 列

Have a look at the file and correct it in the file.查看文件并在文件中进行更正。

The error extraneous " in field... is quite clear. In your CSV file you have an opening quote " but it is not closed, ie the rest of entire file is considered as one single field. extraneous " in field...非常清楚。在您的 CSV 文件中，您有一个开场引号"但它没有关闭，即整个文件的 rest 被视为一个字段。