[英]sparklyr :: Error reading parquet file using Arrow library in R
I am trying to read parquet
file from databricks Filestore
我正在尝试从 databricks Filestore
读取parquet
文件
library(sparklyr)
parquet_dir = /dbfs/FileStore/test/flc_next.parquet'
filenames <- dir(parquet_dir, full.names = TRUE)
"/dbfs/FileStore/test/flc_next.parquet/_committed_6244562942368589642"
[2] "/dbfs/FileStore/test/flc_next.parquet/_started_6244562942368589642"
[3] "/dbfs/FileStore/test/flc_next.parquet/_SUCCESS"
[4] "/dbfs/FileStore/test/flc_next.parquet/part-00000-tid-6244562942368589642-0edceedf-7157-4cce-a084-0f2a4a6769e6-925-1-c000.snappy.parquet"
data_frame(
filename = basename(filenames),
size_bytes = file.size(filenames)
)
rning: `data_frame()` was deprecated in tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
# A tibble: 4 × 2
filename size_bytes
<chr> <dbl>
1 _committed_6244562942368589642 124
2 _started_6244562942368589642 0
3 _SUCCESS 0
4 part-00000-tid-6244562942368589642-0edceedf-7157-4cce-a084-0f2a4a6… 248643
timbre_tbl <- spark_read_parquet("flc_next.parquet", parquet_dir)
Some(<code style = 'font-size:10p'> Error: $ operator is invalid for atomic vectors </code>)
I would appreciate any help/suggestion我将不胜感激任何帮助/建议
Thanks in advance提前致谢
The first argument of spark_read_parquet
expects a spark connection, check this: sparklyr::spark_connect
. spark_read_parquet
的第一个参数需要一个火花连接,检查这个: sparklyr::spark_connect
。 If you are running the codes in Databricks then this should work:如果您在 Databricks 中运行代码,那么这应该可以:
sc <- spark_connect(method = "databricks")
timbre_tbl <- spark_read_parquet(sc, "flc_next.parquet", parquet_dir)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.