简体繁体中英

Amazon EC2 & S3 When using Python / SQLite?

原文 2011-06-10 03:54:17 0 5 python/ sqlite/ amazon-s3/ amazon-ec2

Suppose that I have a huge SQLite file (say, 500[MB]) stored in Amazon S3 . Can a python script that is run on a small EC2 instance directly access and modify that SQLite file? or must I first copy the file to the EC2 instance, change it there and then copy over to S3 ?

Will the I/O be efficient?

Here's what I am trying to do. As I wrote, I have a 500[MB] SQLite file in S3 . I'd like to start say 10 different Amazon EC2 instances that will each read a subset of the file and do some processing (every instance will handle a different subset of the 500[MB] SQLite file). Then, once processing is done, every instance will update only the subset of the data it dealt with (as explained, there will be no overlap of data among processes).

For example, suppose that the SQLite file has say 1M rows :

instance 1 will deal with (and update) rows 0 - 100000

instance 2 will will deal with (and update) rows 100001 - 200000

.........................

instance 10 will deal with (and update) rows 900001 - 1000000

Is it at all possible? Does it sound OK? any suggestions / ideas are welcome.

5 answers

I'd like to start say 10 different Amazon EC2 instances that will each read a subset of the file and do some processing (every instance will handle a different subset of the 500[MB] SQLite file)

You cannot do this with SQLite; on amazon infrastructure or otherwise. sqlite performs database level write locking. unless all ten nodes are performing reads exclusively, you will not attain any kind of concurrency. Even the SQLite website says so.

Situations Where Another RDBMS May Work Better

Client/Server Applications

High-volume Websites

Very large datasets

High Concurrency

Have you considered PostgreSQL?

Since S3 cannot be directly mounted, your best bet is to create an EBS volume containing the SQLite file and work directly with the EBS volume from another (controller) instance. You can then create snapshots of the volume, and archive it into S3. Using a tool like boto (Python API), you can automate the creation of snapshots and the process of moving the backups into S3.

If your db structure is simple, why not just use AWS simpledb ? Or run mysql (or another DB) on one of your instances.

You can mount S3 bucket on your linux machine. See below:

s3fs - http://code.google.com/p/s3fs/wiki/InstallationNotes - this did work for me. It uses FUSE file-system + rsync to sync the files in S3. It kepes a copy of all filenames in the local system & make it look like a FILE/FOLDER.

This is good if the system is already in place and running with huge collection of data. But, if you are building this from scratch then I would suggest you to have an EBS volume for SQLite and use this script to create a snapshot of your EBS volume:

https://github.com/rakesh-sankar/Tools/blob/master/AmazonAWS/EBS/ebs-snapshot.sh

Amazon EFS can be shared among ec2 instances. It's a managed NFS share. SQLITE will still lock the whole DB on write.

The SQLITE Website does not recommend NFS shares, though. But depending on the application you can share the DB read-only among several ec2 instances and store the results of your processing somewhere else, then concatenate the results in the next step.

ClientError (SignatureDoesNotMatch) when trying to upload file to Amazon S3 from an EC2 instance in a Flask app

Access to Amazon S3 Bucket from EC2 instance

Local access to Amazon S3 Bucket from EC2 instance

PHP and Python on amazon ec2

Tutorial: Submitting python script to EC2 using boto3 with data from S3

Copy files from s3 bucket to ec2 instance using python

Python boto3 upload file to S3 from ec2

Python can not access S3 bucket through EC2 instance

Parallel/Async Download of S3 data into EC2 in Python?

Invalid argument type when trying to download specific files from S3 bucket to EC2 using subprocess

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question ClientError (SignatureDoesNotMatch) when trying to upload file to Amazon S3 from an EC2 instance in a Flask app Access to Amazon S3 Bucket from EC2 instance Local access to Amazon S3 Bucket from EC2 instance PHP and Python on amazon ec2 Tutorial: Submitting python script to EC2 using boto3 with data from S3 Copy files from s3 bucket to ec2 instance using python Python boto3 upload file to S3 from ec2 Python can not access S3 bucket through EC2 instance Parallel/Async Download of S3 data into EC2 in Python? Invalid argument type when trying to download specific files from S3 bucket to EC2 using subprocess

Related Tags

Amazon EC2 & S3 When using Python / SQLite?

Question

5 answers

solution1
5 ACCPTED 2011-06-10 05:43:19

solution2
2 2011-06-10 04:06:50

solution3
2 2011-06-10 06:45:01

solution4
1 2011-06-10 04:51:58

solution5
0 2016-08-01 17:42:22

Amazon EC2 & S3 When using Python / SQLite?

Question

5 answers

solution1 5 ACCPTED 2011-06-10 05:43:19

solution2 2 2011-06-10 04:06:50

solution3 2 2011-06-10 06:45:01

solution4 1 2011-06-10 04:51:58

solution5 0 2016-08-01 17:42:22

solution1
5 ACCPTED 2011-06-10 05:43:19

solution2
2 2011-06-10 04:06:50

solution3
2 2011-06-10 06:45:01

solution4
1 2011-06-10 04:51:58

solution5
0 2016-08-01 17:42:22