简体   繁体   中英

best way to work with large dataset in python

I am working with a large financial dataset (15gb for now but will be 200 gb later). what will be the best way to work with it. In particular, I want to do some statistical tests and produce some graphs using millisecond data. So far I have used sqlite3 for the shake of easiness but it seems not able to handle the size of the file. I am using PyCharm (not sure if it helps)

sqlite is not a good alternative if you want to manage large ammounts of data (actually I wouldn't use sqlite for something other than prototyping or running tests).

You can try using amazon RDS to store the database http://aws.amazon.com/es/rds/ and choose between one of the database engines that amazon offers.

As for using Python, I think you should let the DB engine to handle the requests and just use python to produce the graphs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM