簡體   English   中英

蜂巢:無法從Google Cloud Platform讀取文件

[英]Hive : Cannot read file from Google Cloud Platform

我正在嘗試在配置單元查詢中讀取GCP存儲桶中存在的文件。

基本上,我要做的就是

import com.google.cloud.storage.Storage;
import com.google.cloud.storage.Blob;
import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.StorageOptions;

Storage storage = StorageOptions.getDefaultInstance().getService();
Blob blob = storage.get(BlobId.of(bucketName, srcFilename));
String fileContent = new String(blob.getContent());
return fileContent;

現在,當我在Mac上運行此程序時,它可以工作(我以可以訪問存儲桶的方式進行了gcloud設置)

現在,我希望具有相同的功能,但要在蜂巢udf中使用。 所以,我建立了一個非常簡單的罐子

import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;     
import org.apache.hadoop.hive.ql.udf.UDFType;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.Blob;
import com.google.cloud.storage.BlobId;
import com.google.cloud.storage.StorageOptions;

@UDFType(deterministic = true)
public class MyAwesomeUDF extends GenericUDF{

@Override
    public String process(String srcFilename, String bucketName) throws IOException {
        Storage storage = StorageOptions.getDefaultInstance().getService();
    Blob blob = storage.get(BlobId.of(bucketName, srcFilename));
    String fileContent = new String(blob.getContent());
    return fileContent;
    }

}

這是我的pom.xml

<dependencies>
        <!-- https://mvnrepository.com/artifact/com.google.cloud/google-cloud-storage -->
        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>google-cloud-storage</artifactId>
            <version>1.71.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-serde</artifactId>
            <version>1.2.1</version>
            <exclusions>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>apache-log4j-extras</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
    </dependencies>
    <plugins>
<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <finalName>hive-exe-jar-with-dependencies</finalName>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <relocations>
                                <relocation>
                                    <pattern>com.google.common</pattern>
                                    <shadedPattern>repackaged.com.google.common</shadedPattern>
                                </relocation>
                            </relocations>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
    </plugins>

接下來,我構建了這個jar,並且能夠在VM上運行它。

最后,這是我要運行的配置單元查詢

add jar /path/to/my/awesome/jar;
use myDb;

create temporary function awesome_fun as 'package.path.to.my.MyAwesomeUDF';

        select
            awesome_fun('bucketName','srcFileName');

但是我在這里

Exception in thread "main" java.lang.NoSuchMethodError: com.google.api.services.storage.Storage$Objects$Get.setUserProject(Ljava/lang/String;)Lcom/google/api/services/storage/Storage$Objects$Get;
    at com.google.cloud.storage.spi.v1.HttpStorageRpc.getCall(HttpStorageRpc.java:403)
    at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:411)
    at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:198)
    at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:195)
    at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89)
    at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
    at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:195)
    at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:209)

錯誤發生在

Storage storage = StorageOptions.getDefaultInstance().getService();

而且,在構建jar之后,我可以看到(使用jar -tfcom.google.api.services.storage.Storage$Objects$Get存在。

我究竟做錯了什么?

問題是缺少一種方法,請在編譯或驗證編譯后的類和庫的版本相同時,確保更新了您實際運行的類文件。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM