I have a large csv file (~25GB) with a length of 8529090 and when I try to run the following the kernel dies. Running on a MacBook Pro with 16GB RAM.
import dask.dataframe as dd
ddf = dd.read_csv('data/cleaned_news_data.csv')
ddf = ddf[(ddf.type != 'none')].compute()
Any ideas to work around it?
Thanks for the help.
As you comment above, calling compute turns the result into an in-memory object, so if your result doesn't fit in memory then you're out of luck.
Typically people compute smaller results (for example the inputs to a plot) or they write very large results to disk.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.