sparklyr:: 使用 R 中的箭头库读取镶木地板文件时出错

Question

I am trying to read parquet file from databricks Filestore我正在尝试从 databricks Filestore读取parquet文件

library(sparklyr)

parquet_dir has been pre-defined parquet_dir 已预定义

parquet_dir = /dbfs/FileStore/test/flc_next.parquet'

List the files in the parquet dir列出 parquet 目录中的文件

filenames <- dir(parquet_dir, full.names = TRUE)
"/dbfs/FileStore/test/flc_next.parquet/_committed_6244562942368589642"                                                                   
[2] "/dbfs/FileStore/test/flc_next.parquet/_started_6244562942368589642"                                                                     
[3] "/dbfs/FileStore/test/flc_next.parquet/_SUCCESS"                                                                                         
[4] "/dbfs/FileStore/test/flc_next.parquet/part-00000-tid-6244562942368589642-0edceedf-7157-4cce-a084-0f2a4a6769e6-925-1-c000.snappy.parquet"

Show the filenames and their sizes显示文件名及其大小

data_frame(
  filename = basename(filenames),
  size_bytes = file.size(filenames)
)
rning: `data_frame()` was deprecated in tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
# A tibble: 4 × 2
  filename                                                            size_bytes
  <chr>                                                                    <dbl>
1 _committed_6244562942368589642                                             124
2 _started_6244562942368589642                                                 0
3 _SUCCESS                                                                     0
4 part-00000-tid-6244562942368589642-0edceedf-7157-4cce-a084-0f2a4a6…     248643

Import the data into Spark将数据导入 Spark

timbre_tbl <- spark_read_parquet("flc_next.parquet", parquet_dir)

Error: $ operator is invalid for atomic vectors错误：$ 运算符对原子向量无效

Some(<code style = 'font-size:10p'> Error: $ operator is invalid for atomic vectors </code>)

I would appreciate any help/suggestion我将不胜感激任何帮助/建议

Thanks in advance提前致谢

Answer 1

The first argument of spark_read_parquet expects a spark connection, check this: sparklyr::spark_connect . spark_read_parquet的第一个参数需要一个火花连接，检查这个： sparklyr::spark_connect 。 If you are running the codes in Databricks then this should work:如果您在 Databricks 中运行代码，那么这应该可以：

sc <- spark_connect(method = "databricks")
timbre_tbl <- spark_read_parquet(sc, "flc_next.parquet", parquet_dir)

sparklyr:: 使用 R 中的箭头库读取镶木地板文件时出错

问题描述

parquet_dir has been pre-defined parquet_dir 已预定义

List the files in the parquet dir列出 parquet 目录中的文件

Show the filenames and their sizes显示文件名及其大小

Import the data into Spark将数据导入 Spark

Error: $ operator is invalid for atomic vectors错误：$ 运算符对原子向量无效

1 个解决方案

解决方案1
0 2022-02-02 05:38:30

sparklyr:: 使用 R 中的箭头库读取镶木地板文件时出错

问题描述

parquet_dir has been pre-defined parquet_dir 已预定义

List the files in the parquet dir列出 parquet 目录中的文件

Show the filenames and their sizes显示文件名及其大小

Import the data into Spark将数据导入 Spark

Error: $ operator is invalid for atomic vectors错误：$ 运算符对原子向量无效

1 个解决方案

解决方案1 0 2022-02-02 05:38:30

解决方案1
0 2022-02-02 05:38:30