[英]unable to upload pdf files of size more than 10MB in Hbase via python happybase - HDP 3
We are using HDP 3. We are trying to insert PDF files in one of the columns of a particular column family in Hbase table.我们正在使用 HDP 3。我们试图在 Hbase 表中特定列族的列之一中插入 PDF 文件。 Developing environment is python 3.6 and the hbase connector is happybase 1.1.0.
开发环境为python 3.6,hbase连接器为happybase 1.1.0。
We are unable to upload any PDF file greater than 10 MB in hbase.我们无法在 hbase 中上传任何大于 10 MB 的 PDF 文件。
In hbase we have set the parameters as follows:在hbase中我们设置了如下参数:
We get the following error:我们收到以下错误:
IOError(message=b'org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: org.apache.hadoop.hbase.DoNotRetryIOException: Cell with size 80941994 exceeds limit of 10485760 bytes\\n\\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.checkCellSizeLimit(RSRpcServices.java:937)\\n\\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1010)\\n\\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:959)\\n\\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:922)\\n\\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2683)\\n\\tat org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014)\\n\\tat org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)\\n\\tat org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)\\n\\tat org.apache.hadoop.hbase.ipc.RpcE
IOError(message=b'org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: org.apache.hadoop.hbase.DoNotRetryIOException: 单元格大小为 80941994 超过 10485760 字节的限制\\n\\tat org.apache.hadoop .hbase.regionserver.RSRpcServices.checkCellSizeLimit(RSRpcServices.java:937)\\n\\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1010)\\n\\tat org.apache.hadoop.hbase .regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:959)\\n\\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:922)\\n\\tat org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:922)\\n\\tat org.apache.hadoop.hbase.regionserver. .RSRpcServices.multi(RSRpcServices.java:2683)\\n\\tat org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014)\\n\\tat org.apache. hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)\\n\\tat org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)\\n\\tat org.apache.hadoop。 hbase.ipc.RpcE xecutor$Handler.run(RpcExecutor.java:324)\\n\\tat
xecutor$Handler.run(RpcExecutor.java:324)\\n\\tat
You have to check the hbase source code to see what is happening:您必须检查hbase 源代码以查看发生了什么:
private void checkCellSizeLimit(final HRegion r, final Mutation m) throws IOException {
945 if (r.maxCellSize > 0) {
946 CellScanner cells = m.cellScanner();
947 while (cells.advance()) {
948 int size = PrivateCellUtil.estimatedSerializedSizeOf(cells.current());
949 if (size > r.maxCellSize) {
950 String msg = "Cell with size " + size + " exceeds limit of " + r.maxCellSize + " bytes";
951 if (LOG.isDebugEnabled()) {
952 LOG.debug(msg);
953 }
954 throw new DoNotRetryIOException(msg);
955 }
956 }
957 }
958 }
Based on the error message you are exceeding the r.maxCellSize
.根据错误消息,您超出了
r.maxCellSize
。
Note on above: The function PrivateCellUtil.estimatedSerializedSizeOf
is depreciated and will be removed in the future versions.上述注意事项:函数
PrivateCellUtil.estimatedSerializedSizeOf
已贬值,将在未来版本中删除。
Here is its description:这是它的描述:
Estimate based on keyvalue's serialization format in the RPC layer.
根据 RPC 层中 keyvalue 的序列化格式进行估计。 Note that there is an extra SIZEOF_INT added to the size here that indicates the actual length of the cell for cases where cell's are serialized in a contiguous format (For eg in RPCs).
请注意,此处的大小中添加了一个额外的 SIZEOF_INT,用于指示单元格以连续格式序列化的情况下的单元格的实际长度(例如,在 RPC 中)。
You have to check where is the value set.您必须检查值设置在哪里。 First check the "ordinary" values at HRegion.java
首先检查HRegion.java 中的“普通”值
this.maxCellSize = conf.getLong(HBASE_MAX_CELL_SIZE_KEY, DEFAULT_MAX_CELL_SIZE);
So there is probably a HBASE_MAX_CELL_SIZE_KEY
and DEFAULT_MAX_CELL_SIZE
limit somewhere :因此, 某处可能存在
HBASE_MAX_CELL_SIZE_KEY
和DEFAULT_MAX_CELL_SIZE
限制:
public static final String HBASE_MAX_CELL_SIZE_KEY = "hbase.server.keyvalue.maxsize";
public static final int DEFAULT_MAX_CELL_SIZE = 10485760;
Here you have your 10485760 limit which shows at your error message.在这里,您有10485760限制,显示在您的错误消息中。 If you need you can try raising this limit to your limit value.
如果您需要,您可以尝试将此限制提高到您的限制值。 I recommend testing it properly before going live with it (the limit there has probably some reason behind it).
我建议在使用它之前正确测试它(限制可能有一些背后的原因)。
Edit: Adding information about how to change the value of base.server.keyvalue.maxsize
.编辑:添加有关如何更改
base.server.keyvalue.maxsize
值的base.server.keyvalue.maxsize
。 Check the config.files
:检查
config.files
:
Where you can read:你可以在哪里阅读:
hbase.client.keyvalue.maxsize (Description)
hbase.client.keyvalue.maxsize (说明)
Specifies the combined maximum allowed size of a KeyValue instance.
指定 KeyValue 实例的组合最大允许大小。 This is to set an upper boundary for a single entry saved in a storage file.
这是为存储文件中保存的单个条目设置上限。 Since they cannot be split it helps avoiding that a region cannot be split any further because the data is too large.
由于它们无法拆分,因此有助于避免由于数据太大而无法进一步拆分区域。 It seems wise to set this to a fraction of the maximum region size.
将其设置为最大区域大小的一小部分似乎是明智的。 Setting it to zero or less disables the check.
将其设置为零或更少会禁用检查。 Default
默认
10485760
hbase.server.keyvalue.maxsize (Description)
hbase.server.keyvalue.maxsize (说明)
Maximum allowed size of an individual cell, inclusive of value and all key components.
单个单元格的最大允许大小,包括值和所有关键组件。 A value of 0 or less disables the check.
0 或更小的值将禁用检查。 The default value is 10MB.
默认值为 10MB。 This is a safety setting to protect the server from OOM situations.
这是保护服务器免受 OOM 情况的安全设置。 Default
默认
10485760
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.