.rds file internal format

Question

I have lost a .rds file due to the device (let's call it volume 1) getting filled up. Usually when that happened R would throw an error and stop. In that case I had a safe copy on a different volume (volume 2). This time however, R would write the file on volume 1 without error and copy it over to volume 2. Now the file cannot be opened with readRDS anymore with the error "error reading from connection".

The file contains a data.table, is stored uncompressed and infoRDS can read the metadata:

> infoRDS('corrupt.rds')
$version
[1] 3

$writer_version
[1] "3.6.3"

$min_reader_version
[1] "3.5.0"

$format
[1] "xdr"

$native_encoding
[1] "UTF-8"

Also, hexView::readRaw can read the file and shows the names of the columns of the data table.

Using

readRaw('corrupt.rds', endian = 'big', human = 'real', width = 8, offset = 5)

I can see many of the numbers I need to recover. However, this seems very tedious of an approach, since I don't understand the internal format of the .rds file.

I also looked into xmlDeserializeHook which I don't understand how to use. Of course the C code used by readRDS unserializeFromConn contains all the information of the used structure, but a higher level documentation would be helpful.

Is there an easier way than to dive into that C code or pick up the numbers manually one by one?

Answer 1

R internals contains a documentation of the serialisation format. Unless somebody published a more detailed description on the mailing list, that's probably the best we can do. But (at a glance) this looks to be a fairly comprehensive description (definitely when taken together with the implementation).

.rds file internal format

Question

1 answers

solution1
1 2020-09-08 09:48:04

.rds file internal format

Question

1 answers

solution1 1 2020-09-08 09:48:04

solution1
1 2020-09-08 09:48:04