简体   繁体   中英

Write avro file in HDFS - exists

Currently i'm learning spark streaming and avro, So my first example is, read a Spark RDD and build Generic record, create avro file, this file i should write in HDFS. Now I can open avro file and i does append to file of HDFS exists?

This code write an avro file, but when i try add or append, it's failed. I am using java 8 for this

public static void saveAvro(GenericRecord record, Schema schema) throws IOException {

        DatumWriter<GenericRecord> bdPersonDatumWriter = new GenericDatumWriter<>(schema);
        DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(bdPersonDatumWriter);

        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(URI.create("hdfs://sandbox-hdp.hortonworks.com:8020/tmp/poc/ResultHDFSTest.avro"),
                conf);
        Path F = new Path("hdfs://sandbox-hdp.hortonworks.com:8020/tmp/poc/ResultHDFSTest.avro");
        fs.setReplication(F, (short) 1);

        if (!fs.exists(F)) {
            System.out.println("File not exists.. creating....");
            OutputStream out = fs.create(F, (short) 1);
            System.out.println("OutputStream create.");
            dataFileWriter.create(schema, out);
            System.out.println("dataFileWriter create.");
            dataFileWriter.append(record);
            System.out.println("dataFileWriter append OK {0} .");

        } else {
            //Here fail, not open file.. avro stored in HDFS
            System.out.println("File exists....");
           // I want to add information to an existing avro file.
            dataFileWriter.append(record);
            System.out.println("dataFileWriter append OK {1} .");
        }
        dataFileWriter.close();
        System.out.println("dataFileWriter closed.");

    }
    

Stack trace for append exist file avro HDFS:

Exception in thread "main" org.apache.avro.AvroRuntimeException: not open at org.apache.avro.file.DataFileWriter.assertOpen(DataFileWriter.java:88) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:311) at com.test.avro.App.saveAvro(App.java:83) at com.test.avro.App.main(App.java:55)

The DataFileWriter appendTo method only accepts a File java.nio. Is what I am trying to do correct or is there another way?

Edit 1. I want to add information to an existing file.

The first code snippet shows the implementation you are trying to make to create the avro file. Here my frament code of spark streaming:

JavaStreamingContext jssc = sparkConfigurationBuilder
                .buildJSC(sparkConfigurationBuilder.buildSparkConfiguration());
    
    jssc.sparkContext().checkpointFile("c:\\tmp");
    Map<String, Object> kafkaParams = sparkDriverUtils.getKafkaProperties();        
    Collection<String> topics = Arrays.asList(sparkDriverUtils.getTopics().trim().split(","));// 1 o more topics        
    LOGGER.warn("Lista de Topics: " + topics.toString());
    

...

JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(jssc,
                LocationStrategies.PreferConsistent(),
                ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));
//This DSTream resulto to avro..
JavaDStream<Transactions> transactionsDS = transactions.map(f-> {
            Transactions txn = jsonMapperUtil.rowToTransaction(f);
            LOGGER.warn("Retornar  : JavaDStream<Transactions>");
            return  txn;
        });

Now transactionsDS result i want to save as avro file in HDFS. I have a question, JavaStreamingContext i can get o create SparkSession for a Dataset or or should I change how I subscribe to the kafka broker?

Regards.

DataFileWriter appendTo method only accepts a File java.nio

Correct. Avro has no connection to HDFS Paths.

In order to "append to HDFS files", you need to download them locally, then overwrite their whole contents


Besides this, you mention Spark Streaming, but no part of the shown code is actually using a Spark API call

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM