jupyter notebook kernel dies when running dask compute

Question

I have a large csv file (~25GB) with a length of 8529090 and when I try to run the following the kernel dies. Running on a MacBook Pro with 16GB RAM.

import dask.dataframe as dd

ddf = dd.read_csv('data/cleaned_news_data.csv')
ddf = ddf[(ddf.type != 'none')].compute()

Any ideas to work around it?

Thanks for the help.

Answer 1

As you comment above, calling compute turns the result into an in-memory object, so if your result doesn't fit in memory then you're out of luck.

Typically people compute smaller results (for example the inputs to a plot) or they write very large results to disk.

jupyter notebook kernel dies when running dask compute

Question

1 answers

solution1
0 ACCPTED 2019-02-22 23:17:42

jupyter notebook kernel dies when running dask compute

Question

1 answers

solution1 0 ACCPTED 2019-02-22 23:17:42

solution1
0 ACCPTED 2019-02-22 23:17:42