[英]How to convert .rdata file to parquet in Azure data lake using databricks?
So I have a few large.rdata files that were generated through use of the R programming language.所以我有一些通过使用 R 编程语言生成的 large.rdata 文件。 I currently have uploaded them to azure data lake using Azure storage explorer.我目前已经使用 Azure 存储资源管理器将它们上传到 azure 数据湖。 But I have to convert these rdata files to parquet format and then reinsert them into the data lake.但我必须将这些 rdata 文件转换为 parquet 格式,然后将它们重新插入数据湖。 How would I go about doing this?我将如何 go 这样做? I can't seem to find any information about converting from rdata to parquet.我似乎找不到任何有关从 rdata 转换为镶木地板的信息。
If you can use python, there are some libraries, like pyreadr , to load rdata
files as pandas dataframes.如果您可以使用 python,则有一些库,例如pyreadr ,可以将rdata
文件加载为 pandas 数据帧。 You can then write to parquet using pandas or convert to pyspark dataframe.然后,您可以使用 pandas 写入镶木地板或转换为 pyspark dataframe。 Something like this:像这样的东西:
import pyreadr
result = pyreadr.read_r('input.rdata')
print(result.keys()) # check the object name
df = result["object"] # extract the pandas data frame for object name
sdf = spark.createDataFrame(df)
sdf.write.parquet("output")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.