简体   繁体   English

如何在 java 中使用 utf-8 字符串而不分配新的字符串 object 而是作为字节数组的一部分?

[英]How to work with utf-8 strings in java without allocating a new String object but as part of byte array instead?

I have place of my code where I want to read from binary format which include utf-8 strings.我有我的代码的位置,我想从二进制格式中读取,其中包括 utf-8 字符串。

Also I don't want any allocations in this place because they summon GC which pauses the world which kinda bad for me.此外,我不希望在这个地方进行任何分配,因为它们会召唤 GC,这会暂停世界,这对我来说有点糟糕。

I can work perferctly well with most of my primitives and arrays except for strings because java is an "object oriented language" and emphasises on heavy usage of objects (= allocations).我可以完美地使用我的大多数原语和 arrays 除了字符串,因为 java 是一种“面向对象的语言”并且强调对象的大量使用(=分配)。 And it does not provide standard way of working with utf-8 strings without allocations as it has immutable object one.并且它不提供在没有分配的情况下使用 utf-8 字符串的标准方法,因为它具有不可变的 object 之一。 So, what i need from this stuff - validate, extract Char and not to create any other objects.所以,我需要从这些东西中得到什么——验证、提取Char而不是创建任何其他对象。 Ie i should be able to put this thing pool or other place, initialise with data: Array[Byte] and offset: Int and length: X , make no copy.即我应该能够把这个东西池或其他地方,用data: Array[Byte]offset: Intlength: X ,不复制。 CharIterable thing with ability to refurbish this object to other string. CharIterable能够将此 object 翻新为其他字符串的东西。

So, should I do this by hand or someone already have done this?那么,我应该手动执行此操作还是有人已经执行此操作?

I guess you could try to directly call the low-level libraries that String uses internally, like CharsetDecoder which can decode from a ByteBuffer into a pre-allocated CharBuffer.我想您可以尝试直接调用 String 内部使用的低级库,例如CharsetDecoder ,它可以从 ByteBuffer 解码为预分配的 CharBuffer。

But you may be overdoing this, I'd measure if using String (and the associated object allocations) are really a problem first.但是你可能做得过火了,我会先衡量使用 String (以及相关的 object 分配)是否真的是一个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM