简体   繁体   English

如何将复杂对象存储到hadoop Hbase中?

[英]How to store complex objects into hadoop Hbase?

I have complex objects with collection fields which needed to be stored to Hadoop. 我有复杂的对象,收集字段需要存储到Hadoop。 I don't want to go through whole object tree and explicitly store each field. 我不想遍历整个对象树并明确存储每个字段。 So I just think about serialization of complex fields and store it as one big piece. 所以我只考虑复杂字段的序列化并将其存储为一个大块。 And than desirialize it when reading object. 而不是在阅读对象时绝望。 So what is the best way to do it? 那么最好的方法是什么? I though about using some kind serilization for that but I hope that Hadoop has means to handle this situation. 我虽然为此使用了某种类型的血清,但我希望Hadoop有办法处理这种情况。

Sample object's class to store: 要存储的示例对象的类:

class ComplexClass {

<simple fields>

List<AnotherComplexClassWithCollectionFields> collection;


}

HBase only deals with byte arrays, so you can serialize your object in any way you see fit. HBase只处理字节数组,因此您可以以任何您认为合适的方式序列化对象。

The standard Hadoop way of serializing objects is to implement the org.apache.hadoop.io.Writable interface. 序列化对象的标准Hadoop方法是实现org.apache.hadoop.io.Writable接口。 Then you can serialize your object into a byte array using org.apache.hadoop.io.WritableUtils.toByteArray(Writable ... writable) . 然后,您可以使用org.apache.hadoop.io.WritableUtils.toByteArray(Writable ... writable)将对象序列化为字节数组。

Also, there are other serialization frameworks that people in the Hadoop community use, like Avro, Protocol Buffers, and Thrift. 此外,Hadoop社区中的人们还使用其他序列化框架,如Avro,Protocol Buffers和Thrift。 All have their specific use cases, so do your research. 所有都有他们的具体用例,你的研究也是如此。 If you're doing something simple, implementing Hadoop's Writable should be good enough. 如果你做的很简单,那么实现Hadoop的Writable应该足够好了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM