简体繁体中英

How to figure out if a modin dataframe is going to fit in RAM?

原文 2022-04-22 15:03:35 1 1 python/ pandas/ modin

Im learning how to work with large datasets, so im using modin.pandas. I'm doing some aggregation, after which a 50GB dataset is hopefully going to become closer to 5GB in size - and now i need to check: if the df is small enough to fit in RAM, i want to cast it to pandas and enjoy a bug-free reliable library. So, naturally, the question is: how to check it? .memory_usage(deep=True).sum() tells me how much the whole df uses, but i cant possibly know from that one number how much of it is in RAM, and how much is in swap - in other words, how much space do i need for casting the df to pandas. Are there other ways? Am i even right to assume that some partitions live in RAM while others - in swap? How to calculate how much data will flood the RAM when i call ._to_pandas() ? Is there a hidden .__memory_usage_in_swap_that_needs_to_fit_in_ram() of some sorts?

1 answers

Am i even right to assume that some partitions live in RAM while others - in swap?

Modin doesn't specify whether data should be in RAM or swap.

On Ray, it uses ray.put to store partitions. ray.put doesn't give any guarantees about where the data will go. Note that Ray spills data blocks to disk when they are too large for its in-memory object store. You can use ray memory to get a summary of how much of each storage Ray is using.

On Dask, modin uses dask.Client.scatter , which also doesn't give guarantees about where the data will go, to store partition data. I don't know any way to figure out how much of the stored data is really in RAM.

How to append a Modin pandas dataframe to other?

How to replace type: pandas.core.frame.DataFrame with type: modin.pandas.dataframe.DataFrame

How can I create extra columns without going out of RAM, and then use it for ML algorithms?

Can not figure out what is going wrong with this coding

how to figure out trend per unique key. dataframe

Join two modin.pandas.DataFrame(s)

Save the “Out[]” table of a pandas dataframe as a figure

Can't figure out where function is going wrong

I'm new to coding and am going through Tech with Tim's chatbot tutorial. I cannot figure out how to solve this error

How to make a Python image 'fit' the figure?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to append a Modin pandas dataframe to other? How to replace type: pandas.core.frame.DataFrame with type: modin.pandas.dataframe.DataFrame How can I create extra columns without going out of RAM, and then use it for ML algorithms? Can not figure out what is going wrong with this coding how to figure out trend per unique key. dataframe Join two modin.pandas.DataFrame(s) Save the “Out[]” table of a pandas dataframe as a figure Can't figure out where function is going wrong I'm new to coding and am going through Tech with Tim's chatbot tutorial. I cannot figure out how to solve this error How to make a Python image 'fit' the figure?

Related Tags

How to figure out if a modin dataframe is going to fit in RAM?

Question

1 answers

solution1 1 2022-04-25 18:58:43

solution1
1 2022-04-25 18:58:43