How to convert .rdata file to parquet in Azure data lake using databricks?

Question

So I have a few large.rdata files that were generated through use of the R programming language. I currently have uploaded them to azure data lake using Azure storage explorer. But I have to convert these rdata files to parquet format and then reinsert them into the data lake. How would I go about doing this? I can't seem to find any information about converting from rdata to parquet.

Answer 1

If you can use python, there are some libraries, like pyreadr , to load rdata files as pandas dataframes. You can then write to parquet using pandas or convert to pyspark dataframe. Something like this:

import pyreadr

result = pyreadr.read_r('input.rdata')

print(result.keys())  # check the object name
df = result["object"]  # extract the pandas data frame for object name

sdf = spark.createDataFrame(df)

sdf.write.parquet("output")

How to convert .rdata file to parquet in Azure data lake using databricks?

Question

1 answers

solution1
1 ACCPTED 2021-02-05 20:03:16

How to convert .rdata file to parquet in Azure data lake using databricks?

Question

1 answers

solution1 1 ACCPTED 2021-02-05 20:03:16

solution1
1 ACCPTED 2021-02-05 20:03:16