如何使用 databricks 在 Azure 数据湖中将.rdata 文件转换为镶木地板？

Question

So I have a few large.rdata files that were generated through use of the R programming language.所以我有一些通过使用 R 编程语言生成的 large.rdata 文件。 I currently have uploaded them to azure data lake using Azure storage explorer.我目前已经使用 Azure 存储资源管理器将它们上传到 azure 数据湖。 But I have to convert these rdata files to parquet format and then reinsert them into the data lake.但我必须将这些 rdata 文件转换为 parquet 格式，然后将它们重新插入数据湖。 How would I go about doing this?我将如何 go 这样做？ I can't seem to find any information about converting from rdata to parquet.我似乎找不到任何有关从 rdata 转换为镶木地板的信息。

Answer 1

If you can use python, there are some libraries, like pyreadr , to load rdata files as pandas dataframes.如果您可以使用 python，则有一些库，例如pyreadr ，可以将rdata文件加载为 pandas 数据帧。 You can then write to parquet using pandas or convert to pyspark dataframe.然后，您可以使用 pandas 写入镶木地板或转换为 pyspark dataframe。 Something like this:像这样的东西：

import pyreadr

result = pyreadr.read_r('input.rdata')

print(result.keys())  # check the object name
df = result["object"]  # extract the pandas data frame for object name

sdf = spark.createDataFrame(df)

sdf.write.parquet("output")

如何使用 databricks 在 Azure 数据湖中将.rdata 文件转换为镶木地板？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-05 20:03:16

如何使用 databricks 在 Azure 数据湖中将.rdata 文件转换为镶木地板？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-05 20:03:16

解决方案1
1 已采纳 2021-02-05 20:03:16