簡體   English   中英

從 JAR 內部讀取 Avro parquet 文件

[英]Read Avro parquet file from inside JAR

我正在嘗試讀取捆綁為 JAR 內的資源的鑲木地板文件,理想情況下為 stream。

有沒有人有一個不涉及首先將資源寫為臨時文件的工作示例?

這是我用來讀取在 IDE 中工作正常的文件的代碼,然后捆綁為 JAR:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.avro.AvroParquetReader;

                        try {
                            Path path = new Path(classLoader.getResource(pattern_id).toURI());

                            Configuration conf = new Configuration();

                            try (ParquetReader<GenericRecord> r = AvroParquetReader.<GenericRecord>builder(
                                                                             HadoopInputFile.fromPath(path, conf))
                                                                             .disableCompatibility()
                                                                             .build()) {
                                patternsFound.add(pattern_id);

                                GenericRecord record;
                                while ((record = r.read()) != null) {
                                        // Do some work

                                }


                            } catch (IOException e) {
                                e.printStackTrace();
                            }
                        } catch (NullPointerException | URISyntaxException e) {

                            e.printStackTrace();
                        }

從 JAR 文件運行此代碼時,我收到此錯誤:

org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "jar"

我想我可以通過以下方式解決:

InputStream inputFile = classLoader.getResourceAsStream(pattern_id);

但不知道如何讓 AvroParquetReader 與輸入流一起工作。

通過在此處調整解決方案,我最終能夠將鑲木地板文件作為資源 stream 讀取: https://stackoverflow.com/a/58261488/3112960

import org.apache.commons.io.IOUtils;
import org.apache.parquet.io.DelegatingSeekableInputStream;
import org.apache.parquet.io.InputFile;
import org.apache.parquet.io.SeekableInputStream;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;

public class ParquetStream implements InputFile {
    private final String streamId;
    private final byte[] data;

    private static class SeekableByteArrayInputStream extends ByteArrayInputStream {
        public SeekableByteArrayInputStream(byte[] buf) {
            super(buf);
        }

        public void setPos(int pos) {
            this.pos = pos;
        }

        public int getPos() {
            return this.pos;
        }
    }

    public ParquetStream(String streamId, InputStream stream) throws IOException {
        this.streamId = streamId;

        this.data = IOUtils.toByteArray(stream);
    }

    @Override
    public long getLength()  {
        return this.data.length;
    }

    @Override
    public SeekableInputStream newStream() throws IOException {
        return new DelegatingSeekableInputStream(new SeekableByteArrayInputStream(this.data)) {
            @Override
            public void seek(long newPos) {
                ((SeekableByteArrayInputStream) this.getStream()).setPos((int) newPos);
            }

            @Override
            public long getPos() {
                return ((SeekableByteArrayInputStream) this.getStream()).getPos();
            }
        };
    }

    @Override
    public String toString() {
        return "ParquetStream[" + streamId + "]";
    }
}

然后我可以這樣做:

                        InputStream in = classLoader.getResourceAsStream(pattern_id);

                        try {
                            ParquetStream parquetStream = new ParquetStream(pattern_id, in);

                            ParquetReader<GenericRecord> r = AvroParquetReader.<GenericRecord>builder(parquetStream)
                                                                             .disableCompatibility()
                                                                             .build();

                            GenericRecord record;
                            while ((record = r.read()) != null) {
                                // do some work
                            }
                        } catch (IOException e) {
                             e.printStackTrace();
                        }

也許這會對將來的某人有所幫助,因為我找不到任何直接的答案。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM