So I have a few large.rdata files that were generated through use of the R programming language. I currently have uploaded them to azure data lake using Azure storage explorer. But I have to convert these rdata files to parquet format and then reinsert them into the data lake. How would I go about doing this? I can't seem to find any information about converting from rdata to parquet.
If you can use python, there are some libraries, like pyreadr , to load rdata
files as pandas dataframes. You can then write to parquet using pandas or convert to pyspark dataframe. Something like this:
import pyreadr
result = pyreadr.read_r('input.rdata')
print(result.keys()) # check the object name
df = result["object"] # extract the pandas data frame for object name
sdf = spark.createDataFrame(df)
sdf.write.parquet("output")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.