简体   繁体   English

如何从在Kubernetes上运行的Spark集群(2.1)查询hdfs?

[英]How to query hdfs from a spark cluster (2.1) which is running on kubernetes?

I was trying to access HDFS files from a spark cluster which is running inside Kubernetes containers. 我试图从Kubernetes容器中运行的Spark集群访问HDFS文件。

However I keep on getting the error: AnalysisException: 'The ORC data source must be used with Hive support enabled;' 但是,我继续收到错误消息:AnalysisException:'必须在启用Hive支持的情况下使用ORC数据源;'

What I am missing here? 我在这里想念的是什么?

Are you have SparkSession created with enableHiveSupport()? 您是否使用enableHiveSupport()创建了SparkSession?

Similar issue: Spark can access Hive table from pyspark but not from spark-submit 相似的问题: Spark可以从pyspark访问Hive表,但不能从spark-submit访问

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 pydoop 访问 hdfs 集群 - Access hdfs cluster from pydoop 将应用程序提交到从Python笔记本在GCP中运行的独立Spark集群 - Submit an application to a standalone spark cluster running in GCP from Python notebook Spark 在本地机器而不是独立集群中运行 - Spark running in local machine instead of standalone cluster 使用Spark 2.1运行PySparkling的H20Context错误 - Error with H20Context running PySparkling with Spark 2.1 在 Kubernetes 上运行 Spark 作业时,如何避免 Pod 的 DiskPressure 条件及其最终驱逐? - How to avoid DiskPressure condition of pods and their eventual eviction while running Spark job on Kubernetes? 使用 python api 从 GCP 管理 Kubernetes 集群 - Managing Kubernetes cluster from GCP with python api 如何从Kubernetes集群中的Airflow工作程序在另一个命名空间中的另一个窗格中运行命令 - How to run a command in another pod in another namespace from an Airflow worker in a Kubernetes cluster 从Spark DataFrame创建非HDFS CSV - Create a non-hdfs csv from spark dataframe 创建Spark DataFrame时从hdfs文件传递架构 - Pass schema from hdfs file while creating Spark DataFrame 从 Spark 容器无法访问 docker 集群中的 Minio - Minio in docker cluster is not reachable from spark container
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM