Py4JJavaError：调用 o389.csv 时出错

Question

I'm new to pyspark.我是 pyspark 的新手。 I'm running pyspark using databricks.我正在使用数据块运行 pyspark。 My data is stored in Azure Data Lake Service.I'm trying to read csv file from ADLS to pyspark data frame.我的数据存储在 Azure 数据湖服务中。我正在尝试从 ADLS 读取 csv 文件到 pyspark 数据框。 So I wrote following code所以我写了以下代码

import pyspark
from pyspark import SparkContext 
from pyspark import SparkFiles

df = sqlContext.read.csv(SparkFiles.get("dbfs:mycsv path in ADSL/Data.csv"), 
   header=True, inferSchema= True)

But I'm getting error message但我收到错误信息

Py4JJavaError: An error occurred while calling o389.csv.

Can you suggest me to rectify this error?你能建议我纠正这个错误吗？

Answer 1

The SparkFiles class is intended for accessing the files shipped as part of the Spark job. SparkFiles类用于访问作为 Spark 作业的一部分提供的文件。 If you just need access to the CSV file available on ADLS, then you just need to use spark.read.csv , like:如果您只需要访问 ADLS 上可用的 CSV 文件，那么您只需要使用spark.read.csv ，例如：

df = spark.read.csv("dbfs:mycsv path in ADSL/Data.csv", 
  header=True, inferSchema=True)

it's better not to use sqlContext , it's kept for compatibility reasons.最好不要使用sqlContext ，它是出于兼容性原因而保留的。

Py4JJavaError：调用 o389.csv 时出错

问题描述

1 个解决方案

解决方案1
0 2020-10-06 14:12:54

Py4JJavaError：调用 o389.csv 时出错

问题描述

1 个解决方案

解决方案1 0 2020-10-06 14:12:54

解决方案1
0 2020-10-06 14:12:54