简体   繁体   English

ClassNotFoundException:找不到数据源:bigquery

[英]ClassNotFoundException: Failed to find data source: bigquery

I'm trying to load data from Google BigQuery into Spark running on Google Dataproc (I'm using Java).我正在尝试将数据从 Google BigQuery 加载到在 Google Dataproc 上运行的 Spark(我正在使用 Java)。 I tried to follow instructions on here: https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example我试着按照这里的说明操作: https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example

I get the error: " ClassNotFoundException: Failed to find data source: bigquery ."我收到错误:“ ClassNotFoundException: Failed to find data source: bigquery 。”

My pom.xml looks like this:我的 pom.xml 看起来像这样:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.virtualpairprogrammers</groupId>
    <artifactId>learningSpark</artifactId>
    <version>0.0.3-SNAPSHOT</version>
    <packaging>jar</packaging>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <java.version>1.8</java.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.3.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.3.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.2.0</version>
        </dependency>
        <dependency>
            <groupId>com.google.cloud.spark</groupId>
            <artifactId>spark-bigquery_2.11</artifactId>
            <version>0.9.1-beta</version>
            <classifier>shaded</classifier>
        </dependency>

    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.5.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-jar-plugin</artifactId>
                <version>3.0.2</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <archive>
                        <manifest>
                            <mainClass>com.virtualpairprogrammers.Main</mainClass>
                        </manifest>
                    </archive>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

After adding the dependency to my pom.xml it was downloading a lot to build the.jar, so I think I should have the correct dependency?将依赖项添加到我的 pom.xml 后,它下载了很多来构建 .jar,所以我认为我应该有正确的依赖项? However, Eclipse is also warning me that "The import com.google.cloud.spark.bigquery is never used".但是,Eclipse 也警告我“从未使用导入 com.google.cloud.spark.bigquery”。

This is the part of my code where I get the error:这是我收到错误的代码部分:

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import com.google.cloud.spark.bigquery.*;

public class Main {

    public static void main(String[] args) {

        SparkSession spark = SparkSession.builder()
                .appName("testingSql")
                .getOrCreate();

        Dataset<Row> data = spark.read().format("bigquery")
                .option("table","project.dataset.tablename")
                .load()
                .cache();

I think you only added BQ connector as compile time dependency, but it is missing at runtime.我认为您仅将 BQ 连接器添加为编译时依赖项,但在运行时缺少它。 You need to either make a uber jar which includes the connector in your job jar (the doc needs to be updated), or include it when you submit the job gcloud dataproc jobs submit spark --properties spark.jars.packages=com.google.cloud.spark:spark-bigquery_2.11:0.9.1-beta .您需要制作一个超级 jar ,其中包括您的作业gcloud dataproc jobs submit spark --properties spark.jars.packages=com.google.cloud.spark:spark-bigquery_2.11:0.9.1-beta中的连接器(文档需要更新),或者在您提交作业时包含它gcloud dataproc jobs submit spark --properties spark.jars.packages=com.google.cloud.spark:spark-bigquery_2.11:0.9.1-beta

I faced the same issue and updated the format from "bigquery" to "com.google.cloud.spark.bigquery" and that worked for me.我遇到了同样的问题并将格式从“bigquery”更新为“com.google.cloud.spark.bigquery”,这对我有用。

Specifying the dependency in the build.sbt and using "com.google.cloud.spark.bigquery" in format as suggested by Peter resolved the issue for me.在 build.sbt 中指定依赖项并按照 Peter 建议的格式使用“com.google.cloud.spark.bigquery”为我解决了这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用Maven制作胖罐时,“找不到数据源:镶木地板” - “Failed to find data source: parquet” when making a fat jar with maven 使用BigQuery时HttpTransportOptions的ClassNotFoundException - ClassNotFoundException for HttpTransportOptions when using BigQuery ClassNotFoundException(来源不明) - ClassNotFoundException (unknown source) 如何从 BigQuery 流式传输(查询)数据作为 Siddhi 应用程序中的源 - How to stream (query) data from BigQuery as source in a Siddhi App 从数据源获取数据库连接失败 - Failed to obtain DB connection from data source Android应用程序在ActivityThread中的RuntimeException无法找到某些源? - Android application RuntimeException in ActivityThread failed to find some source? 反序列化数据时发生ClassNotFoundException - ClassNotFoundException while deserializing data 如何使用Primefaces在Java中查找数据源(dataTable)? - How to find the data source (dataTable) in Java with Primefaces? ServiceMix无法找到OSGI数据源 - ServiceMix unable to find OSGI data source 为什么我在谷歌地图上出现这个错误“无法加载DynamiteLoader:java.lang.ClassNotFoundException:没找到类? - Why I get this error when on google map "Failed to load DynamiteLoader: java.lang.ClassNotFoundException: Didn't find class?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM