Problem I'm currently trying to write a function that filters some rows of a disk.frame object using regular expressions. I, unfortunately, run into ...
Problem I'm currently trying to write a function that filters some rows of a disk.frame object using regular expressions. I, unfortunately, run into ...
I am producing a very big datasets (>120 Gb), which are actually a list of named (100x100x3) matrices. A very large lists (millions of records). Th ...
Problem: I am trying to perform a correlation test on a large dataset: the data.table can exist in memory, but operating on it with Hmisc::rcorr() or ...
According to the article https://diskframe.com/articles/ingesting-data.html a good use case for inmapfn as part of csv_to_disk_frame(...) is for date ...
i have around 15 GB of zipped data in 30 minute packages. unzipping and reading them with either unzip and readr or fread works just fine but the ram- ...
This question is a follow-up from this thread I'd like to perform three actions on a disk frame Count the distinct values of the field id grouped ...
It's set to 50 by default on a scale of 1 to 100. I have an especially large disk frame and I'm considering using a high number. What are the import ...
I have a disk frame that I've saved into a file. It's made up of ten chunks. I coded every one of the columns as a character because I intend on comb ...
I'm getting this error when trying to import CSVs using this code: some.df = csv_to_disk.frame(list.files("some/path")) Error in split_every_nlin ...
I'm running n_distinct on a large file (>30GB) and it doesn't appear to produce an exact result. I have another reference point for the data, and ...
I'm working with disk frame and it's great so far. One piece that confuses me is the chunk size. I sense that a small chunk might create too many tas ...
I'm trying to perform a group by on a disk frame and it's getting this error Error in serialize(data, node$con) : error writing to connection with ...
I ran a group by on a large dataset (>20GB) and it doesn't appear to be working quite right This is my code It returned this error Warning ...
I have a disk frame with these columns Say the disk frame is 200M rows and I'd like to group it by key_b. Additionally, I want to keep the underlyi ...
I saved a disk frame to its output directory and then restarted my R session. I'd like to read the existing disk frame instead of recreating it elsew ...
I have two disk frame and each are about 20GB worth of files. It's too big to merge as data tables because the process requires more than the memory ...
I'm looking through the docs and I don't see a function for writing to CSV. It appears there's a function for writing the disk frame, but it's unclea ...
I'd like to convert a data frame to a disk frame and then count the first column. It's not counting the number of unique values of the column when I t ...
I am using the disk.frame package and I wanted to know how many workers is disk.frame using to perform the operations? I looked through disk.frame doc ...