简体   繁体   English

将base64编码的字符串存储在HBase中

[英]Store base64 encoded string in HBase

I have a very specific requirement of storing PDF data in Hbase columns. 我对在Hbase列中存储PDF数据有一个非常具体的要求。 The source of Data is Mongo DB, from where the base64 encoded data is read and I will need to bulk upload it to Hbase table. 数据源是Mongo DB,从那里读取base64编码的数据,我需要将其批量上传到Hbase表。

I realized that in base64 encoded string there are a lot of "\\n" character which splits the entire string into parts. 我意识到,在base64编码的字符串中,有很多“ \\ n”字符将整个字符串分成多个部分。 Not sure if it is because of this, but when I store the string as it is, using a put : 不知道是否是因为这个原因,但是当我按原样存储字符串时,使用put:

 put.add(Bytes.toBytes(ColFamilyName), Bytes.toBytes(columnName), Bytes.toBytes(data.replaceAll("\n","").toString()));

It is storing only the first line from the entire encoded string. 它仅存储整个编码字符串中的第一行。 Eg : 例如:

If the actual content was something like this : "JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovQ3JlYXRvciAoQXBhY2hlIEZPUCBWZXJzaW9uIDEu " + "MSkKL1Byb2R1Y2VyIChBcGFjaGUgRk9QIFZlcnNpb24gMS4xKQovQ3JlYXRpb25EYXRlIChEOjIw\\n" + "MTUwODIyMTIxMjM1KzAzJzAwJykKPj4KZW5kb2JqCjUgMCBvYmoKPDwKICAvTiAzCiAgL0xlbmd0\\n" + 如果实际的内容是这样的: “JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovQ3JlYXRvciAoQXBhY2hlIEZPUCBWZXJzaW9uIDEu” + “MSkKL1Byb2R1Y2VyIChBcGFjaGUgRk9QIFZlcnNpb24gMS4xKQovQ3JlYXRpb25EYXRlIChEOjIw \\ n” + “MTUwODIyMTIxMjM1KzAzJzAwJykKPj4KZW5kb2JqCjUgMCBvYmoKPDwKICAvTiAzCiAgL0xlbmd0 \\ n” +

It is storing only the first line which is : JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovQ3JlYXRvciAoQXBhY2hlIEZPUCBWZXJzaW9uIDEu 它仅存储以下第一行:JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PAovQ3JlYXRvciAoQXBhY2hlIEZPUCBWZXJzaW9uIDEu

in the column. 在列中。 Even after trying to remove the "\\n" manually it is the same output. 即使尝试手动删除“ \\ n”,它也是相同的输出。

Could someone please guide me in the right direction here ? 有人可以在这里指导我正确的方向吗?

Currently, I am also working on Base64 encoding. 目前,我还在研究Base64编码。 As per my understanding, you should try using org.apache.hadoop.hbase.util.Base64.encodeBytes(byte[] source, int option) method where DONT_BREAK_LINES can be used as an option. 根据我的理解,您应该尝试使用org.apache.hadoop.hbase.util.Base64.encodeBytes(byte[] source, int option)方法,其中可以将DONT_BREAK_LINES用作选项。 Please let me know if this works fine. 请让我知道这是否正常。

Managed to solve it. 设法解决它。 The issue was when reading the Base64 encoded data from MongoDB Source. 问题是从MongoDB Source读取Base64编码的数据时。 Read the data from Mongo DB document DBObject as: 从Mongo DB文档DBObject中读取数据,如下所示:

jsonObj.get("receiptContent").toString().replaceAll("\\n","") jsonObj.get(“ receiptContent”)。toString()。replaceAll(“ \\ n”,“”)

And stored it as such in Hbase. 并将其存储在Hbase中。 Even from the Hue HBase UI Browser I can see the PDF content now. 即使从Hue HBase UI浏览器,我现在也可以看到PDF内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM