Tag[disk.frame] Recent Newest Questions

Problem with non-standard evaluation in disk.frame objects using data.table syntax

Problem I'm currently trying to write a function that filters some rows of a disk.frame object using regular expressions. I, unfortunately, run into ...

Does disk.frame allow to work with large lists in R?

I am producing a very big datasets (>120 Gb), which are actually a list of named (100x100x3) matrices. A very large lists (millions of records). Th ...

Using disk.frame, but still reaching memory limit issue

Problem: I am trying to perform a correlation test on a large dataset: the data.table can exist in memory, but operating on it with Hmisc::rcorr() or ...

How can I input a single additional parameter to disk.frame's inmapfn at readin?

According to the article https://diskframe.com/articles/ingesting-data.html a good use case for inmapfn as part of csv_to_disk_frame(...) is for date ...

columns jumbled after using csv_to_disk.frame

i have around 15 GB of zipped data in 30 minute packages. unzipping and reading them with either unzip and readr or fread works just fine but the ram- ...

Do I have to use collect with disk frames?

This question is a follow-up from this thread I'd like to perform three actions on a disk frame Count the distinct values of the field id grouped ...

How should we choose the compression rate with rbindlist.disk.frame?

It's set to 50 by default on a scale of 1 to 100. I have an especially large disk frame and I'm considering using a high number. What are the import ...

In format.default(nam.ob, width = max(ncn), justify = "left") : NAs introduced by coercion to integer range

I have a disk frame that I've saved into a file. It's made up of ten chunks. I coded every one of the columns as a character because I intend on comb ...

CSV to disk frame with multiple CSVs

I'm getting this error when trying to import CSVs using this code: some.df = csv_to_disk.frame(list.files("some/path")) Error in split_every_nlin ...

Is n_distinct an exact calculation with disk frames?

I'm running n_distinct on a large file (>30GB) and it doesn't appear to produce an exact result. I have another reference point for the data, and ...

How should we select the chunk size in disk frame?

I'm working with disk frame and it's great so far. One piece that confuses me is the chunk size. I sense that a small chunk might create too many tas ...

Error in serialize(data, node$con) : error writing to connection with disk frame

I'm trying to perform a group by on a disk frame and it's getting this error Error in serialize(data, node$con) : error writing to connection with ...

My group by doesn't appear to be working in disk frames

I ran a group by on a large dataset (>20GB) and it doesn't appear to be working quite right This is my code It returned this error Warning ...

How does srckeep affect the underlying disk frame?

I have a disk frame with these columns Say the disk frame is 200M rows and I'd like to group it by key_b. Additionally, I want to keep the underlyi ...

How do I read a disk frame that's already been saved?

I saved a disk frame to its output directory and then restarted my R session. I'd like to read the existing disk frame instead of recreating it elsew ...

How do I bind two disk frames together?

I have two disk frame and each are about 20GB worth of files. It's too big to merge as data tables because the process requires more than the memory ...

What's the best way to write a disk frame to CSV?

I'm looking through the docs and I don't see a function for writing to CSV. It appears there's a function for writing the disk frame, but it's unclea ...

How do count unique entities with disk.frame in R?

I'd like to convert a data frame to a disk frame and then count the first column. It's not counting the number of unique values of the column when I t ...

How do I find out how many workers is my disk.frame using?

I am using the disk.frame package and I wanted to know how many workers is disk.frame using to perform the operations? I looked through disk.frame doc ...