简体   繁体   中英

How to convert .rdata file to parquet in Azure data lake using databricks?

So I have a few large.rdata files that were generated through use of the R programming language. I currently have uploaded them to azure data lake using Azure storage explorer. But I have to convert these rdata files to parquet format and then reinsert them into the data lake. How would I go about doing this? I can't seem to find any information about converting from rdata to parquet.

If you can use python, there are some libraries, like pyreadr , to load rdata files as pandas dataframes. You can then write to parquet using pandas or convert to pyspark dataframe. Something like this:

import pyreadr

result = pyreadr.read_r('input.rdata')

print(result.keys())  # check the object name
df = result["object"]  # extract the pandas data frame for object name

sdf = spark.createDataFrame(df)

sdf.write.parquet("output")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM