Our current Python pipeline scrapes from the web and stores those data into the MongoDB. After that we load the data into an analysis algorithm. This works well on a local computer since mongod
locates the database, but I want to upload the database on sharing platform like Google Drive so that other users can use the data without having to run the scraper again.
I know that MongoDB stores data at /data/db as default, so could I upload the entire /data/db onto the Google Drive?
Another option seems to be exporting MongoDB into JSON or CSV, but our current implementation for the analysis algorithm already loads directly from MongoDB.
Yes, you can upload the /data
directory, that is one way to backup the database. You can also use mongodump
with --gzip
or mongoexport
as your pointed out yourself.
If you wish to do the backup regularly then you can cp/rsync
the /data directory on regular basis. You can also bash script mongodump/mongoresore
and mongoexport/mongoimport
to backup database on regular basis or use mongolab as recommended by other answers.
So you have three options then,
mongodump
and mongorestore
/data
directory (cp/rsync this directory if you want regular backups) mongoexport
and mongoimport
(read here before using this) Using mongodump and restore
In version 3.x you simply run (dumps default mongodb instance with default port)
mongodump
Earlier versions you need to specify --dbPath
The above mongodump command creates a dump directory inside which it will create sub directories for each database inside mongodb. If you wish to dump a specific collection (name=collection) then something as follow would be useful
mongodump --db test --collection collection
You can use the --gzip
and other similar command line options. For more details and extra command line options read here .
You can restore a dumped database using mongorestore
and the command is as follows
mongorestore --dir <path>
Just like mongodump you can specify a hostname, port number (if diff.), a db name etc etc read here for more information.
Using mongoexport and mongoimport
Allows importing in JSON or CSV formats. Not recommended for full backup of prod, see here . To export you run a command with one or many options as follow (specify the db name and collection you wish to backup -- default to JSON but, if you wish to import to CSV then --type=csv
)
mongoexport --db threads --collection messages --out messages.json
You can import a backed up collection to mongodb using mongoimport as follow
mongoimport --db threads --collection message --file messages.json
See here for more options, specially if you want to export a result of a query.
You can create a little Rest API for your database with unique keys and all peoples in your team will can use it.
If you want to use export only one time - just export it to JSON and no problem.
You could run a MongoDB instance in the cloud. You could for example use MongoLab ( https://mongolab.com/ ) or install your own instance on a VM in the cloud and use one of the cloud providers like Microsoft Azure, Amazon AWS or Google Compute engine. Alternatively you could create a REST API as proposed by JRazor, however, this will require more development work.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.