简体   繁体   English

使用hadoop FileSystem从本地文件系统中的jar文件读取

[英]Reading from jar file in local filesystem using hadoop FileSystem

We have a maven project with some files in the resources dir which get copied into the root of the jar file. 我们有一个Maven项目,在资源目录中有一些文件,这些文件被复制到jar文件的根目录中。 I have the following bit of code which works fine during JUnit testing but stops working once I try to execute it from the jar 我有以下代码在JUnit测试期间可以正常工作,但是一旦我尝试从jar中执行它便停止工作

        Configuration configuration = new Configuration();
        String pathString = MainClass.class.getClassLoader().getResource("dir").getPath();
        Path path = new Path(pathString);

        logger.debug(path);
        FileSystem fs = path.getFileSystem(configuration);
        if (fs.exists(path)) {
            logger.debug("WOOOOO");
        } else {
            logger.debug("BOOOOO");
        }

While testing, the output is: 在测试时,输出为:

DEBUG: /path/to/project/target/test-classes/dir
DEBUG: WOOOOO

While running from jar I get: 从jar运行时,我得到:

DEBUG file:/path/to/jar/project.jar!/dir
DEBUG BOOOOO

Needless to say, the jar file is in the correct location and the dir is in the root of that jar. 不用说,jar文件位于正确的位置,而dir位于该jar的根目录中。

In case you're wondering why we're doing this, the second half is little test excerpt, which mimics what NaiveBayesModel.materialize() in Mahout does. 如果您想知道为什么要这样做,那么后半部分几乎没有测试摘录,它模仿了Mahout中的NaiveBayesModel.materialize()所做的事情。 We just need to be able to create a path that Mahout will understand. 我们只需要能够创建Mahout可以理解的路径。

The exception java.io.IOException: No FileSystem for scheme: jar means that you can't create a File object or open an FSDataInputStream (What Mahout does) with an URI that references something inside a jar object. 异常java.io.IOException: No FileSystem for scheme: jar没有File java.io.IOException: No FileSystem for scheme: jar表示您无法创建File对象,也无法使用引用jar对象内部内容的URI打开FSDataInputStream (Mahout所做的事情)。

Schemes file and hdfs have FileSystem implementations, hence, I guess the only solution for you case, since you want to call NaiveBayesModel.materialize() , is to dump the files inside the dir directory of your jar into one of the two FileSystem that I mentioned and then create a Path from it. Schemes filehdfs具有FileSystem实现,因此,我想针对您的情况的唯一解决方案是,因为您要调用NaiveBayesModel.materialize() ,是将jardir目录内的文件转储到我两个FileSystem之一中。提及,然后从中创建Path

In other hand, you can try to reproduce what Mahout does , which is the instantiation of a NaiveBayesModel . 另一方面,您可以尝试重现Mahout所做的事情 ,这是NaiveBayesModel的实例。

I don't have experience with Mahout, but I guess it's a good point to start, hope it helps. 我没有Mahout的经验,但我认为这是一个不错的起点,希望对您有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM