简体   繁体   中英

Best approach to work with .csv files with 4 gigas

I am on data science. I have a.csv file with 5kk records and 3.9gigas of size. Whats the best pratice to deal with it? I normally use vscode or jupyter and even when i set max-memory to 10gigas the operations like load etc are taking too much time to complete.

What do you recommend to improve my work?

notebook lenovo S145 20gigas ram i7-8565U - Ubuntu

Thanks

If you want to bring a CSV into a database for reporting, one fairly quick and easy option is to use an external table. It uses syntax similar to SQLLDR in the create table definition. Once established, the latest saved CSV data will immediately be available as a table in the database.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM