简体   繁体   English

如何使用C ++接口从AVRO文件中读取数据?

[英]How to read data from AVRO file using C++ interface?

I'm attempting to write a simple program to extract some data from a bunch of AVRO files. 我正在尝试编写一个简单的程序来从一堆AVRO文件中提取一些数据。 The schema for each file may be different so I would like to read the files generically (ie without having to pregenerate and then compile in the schema for each) using the C++ interface. 每个文件的模式可能不同,所以我想使用C ++接口一般性地读取文件(即不必为每个文件预生成然后在模式中编译)。

I have been attempting to follow the generic.cc example but it assumes a separate schema where I would like to read the schema from each AVRO file. 我一直在尝试遵循generic.cc示例,但它假设一个单独的架构,我想从每个AVRO文件中读取架构。

Here is my code: 这是我的代码:

#include <fstream>
#include <iostream>

#include "Compiler.hh"
#include "DataFile.hh"
#include "Decoder.hh"
#include "Generic.hh"
#include "Stream.hh"

const std::string BOLD("\033[1m");
const std::string ENDC("\033[0m");
const std::string RED("\033[31m");
const std::string YELLOW("\033[33m");

int main(int argc, char**argv)
{
    std::cout << "AVRO Test\n" << std::endl;

    if (argc < 2)
    {
        std::cerr << BOLD << RED << "ERROR: " << ENDC << "please provide an "
                  << "input file\n" << std::endl;
        return -1;
    }

    avro::DataFileReaderBase dataFile(argv[1]);
    auto dataSchema = dataFile.dataSchema();

    // Write out data schema in JSON for grins
    std::ofstream output("data_schema.json");
    dataSchema.toJson(output);
    output.close();

    avro::DecoderPtr decoder = avro::binaryDecoder();
    auto inStream = avro::fileInputStream(argv[1]);
    decoder->init(*inStream);

    avro::GenericDatum datum(dataSchema);
    avro::decode(*decoder, datum);
    std::cout << "Type: " << datum.type() << std::endl;

    return 0;
}

Everytime I run the code, no matter what file I use, I get this: 每次我运行代码时,无论我使用什么文件,我都会得到:

$ ./avrotest twitter.avro $ ./avrotest twitter.avro
AVRO Test AVRO测试

terminate called after throwing an instance of 'avro::Exception' 在抛出'avro :: Exception'的实例后终止调用
what(): Cannot have negative length: -40 Aborted what():不能有负长度:-40 Aborted

In addition to my own data files, I have tried using the data files located here: https://github.com/miguno/avro-cli-examples , with the same result. 除了我自己的数据文件,我还尝试使用这里的数据文件: https//github.com/miguno/avro-cli-examples ,结果相同。

I tried using the avrocat utility on all of the same files and it works fine. 我尝试在所有相同的文件上使用avrocat实用程序,它工作正常。 What am I doing wrong? 我究竟做错了什么?

(NOTE: outputting the data schema for each file in JSON works correctly as expected) (注意:在JSON中输出每个文件的数据模式可以正常工作)

After a bunch more fooling around, I figured it out. 经过一堆更多的愚弄,我想通了。 You're supposed to use DataFileReader templated with GenericDatum . 你应该使用GenericDatum模板化的DataFileReader With the end result being something like this: 最终结果是这样的:

#include <fstream>
#include <iostream>

#include "Compiler.hh"
#include "DataFile.hh"
#include "Decoder.hh"
#include "Generic.hh"
#include "Stream.hh"

const std::string BOLD("\033[1m");
const std::string ENDC("\033[0m");
const std::string RED("\033[31m");
const std::string YELLOW("\033[33m");

int main(int argc, char**argv)
{
    std::cout << "AVRO Test\n" << std::endl;

    if (argc < 2)
    {
        std::cerr << BOLD << RED << "ERROR: " << ENDC << "please provide an "
                  << "input file\n" << std::endl;
        return -1;
    }

    avro::DataFileReader<avro::GenericDatum> reader(argv[1]);
    auto dataSchema = reader.dataSchema();

    // Write out data schema in JSON for grins
    std::ofstream output("data_schema.json");
    dataSchema.toJson(output);
    output.close();

    avro::GenericDatum datum(dataSchema);
    while (reader.read(datum)) 
    {
        std::cout << "Type: " << datum.type() << std::endl;
        if (datum.type() == avro::AVRO_RECORD) 
        {
            const avro::GenericRecord& r = datum.value<avro::GenericRecord>();
            std::cout << "Field-count: " << r.fieldCount() << std::endl;

            // TODO: pull out each field
        }
    }

    return 0;
}

Perhaps an example like this should be included with libavro... 也许这样的例子应该包含在libavro中......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM