简体   繁体   English

Java字符串:内存如何与不可变字符串一起工作

[英]Java Strings : how the memory works with immutable Strings

I have a simple question. 我有一个简单的问题。

byte[] responseData = ...;
String str = new String(responseData);
String withKey = "{\"Abcd\":" + str + "}";

in the above code, are these three lines taking 3X memory. 在上面的代码中,这三行占用了3X内存。 for example if the responseData is 1mb, then line 2 will take an extra 1mb in memory and then line 3 will take extra 1mb + xx. 例如,如果responseData为1mb,则第2行将占用额外的1mb内存,然后第3行将占用额外的1mb + xx。 is this true? 这是真的? if no, then how it is going to work. 如果没有,那么它将如何工作。 if yes, then what is the optimal way to fix this. 如果是,那么解决此问题的最佳方法是什么。 will StringBuffer help here? StringBuffer在这里有帮助吗?

Yes, that sounds about right. 是的,听起来不错。 Probably even more because your 1MB byte array needs to be turned into UTF-16, so depending on the encoding, it may be even bigger (2MB if the input was ASCII). 可能甚至更多,因为需要将1MB字节数组转换为UTF-16,因此取决于编码,它可能更大(如果输入为ASCII,则为2MB)。

Note that the garbage collector can reclaim memory as soon as the variables that use it go out of scope. 请注意,只要使用垃圾回收器的变量超出范围,垃圾回收器便可以回收内存。 You could set them to null as early as possible to help it make this as timely as possible (for example responseData = null; after you constructed your String). 您可以尽早将它们设置为null ,以帮助使其尽可能及时(例如responseData = null;在构造String之后, responseData = null; )。

if yes, then what is the optimal way to fix this 如果是,那么解决此问题的最佳方法是什么

"Fix" implies a problem. “修复”表示一个问题。 If you have enough memory there is no problem. 如果您有足够的内存,那就没有问题。

the problem is that I am getting OutOfMemoryException as the byte[] data coming from server is quite big, 问题是我收到OutOfMemoryException,因为来自服务器的byte []数据很大,

If you don't, you have to think about a better alternative to keeping a 1MB string in memory. 如果不这样做,则必须考虑一种更好的替代方法,以在内存中保留1MB字符串。 Maybe you can stream the data off a file? 也许您可以从文件流式传输数据? Or work on the byte array directly? 还是直接在字节数组上工作? What kind of data is this? 这是什么样的数据?

The problem is that I am getting OutOfMemoryException as the byte[] data coming from server is quite big, thats why I need to figure it out first that am I doing something wrong .... 问题是由于来自服务器的byte[]数据很大,因此我收到了OutOfMemoryException ,这就是为什么我需要首先弄清楚我做错了什么的原因...。

Yes. 是。 Well basically your fundamental problem is that you are trying to hold the entire string in memory at one time. 好吧,基本上,您的基本问题是您试图一次将整个字符串保存在内存中。 This is always going to fail for a sufficiently large string ... even if you code it in the most optimal memory efficient fashion possible. 对于足够大的字符串,这总是会失败的...即使您以尽可能最佳的内存效率方式对其进行编码。 (And that would be complicated in itself.) (这本身就很复杂。)

The ultimate solution (ie the one that "scales") is to do one of the following: 最终解决方案(即“扩展”的解决方案)是执行以下操作之一:

  • stream the data to the file system, or 将数据流传输到文件系统,或者

  • process it in such a way that you don't need ever need the entire "string" to be represented. 以这样的方式处理它,您根本不需要整个“字符串”都可以表示出来。


You asked if StringBuffer will help. 您询问StringBuffer是否会帮助您。 It might help a bit ... provided that you use it correctly. 如果您正确使用它,可能会有所帮助。 The trick is to make sure that you preallocate the StringBuffer (actually a StringBuilder is better!!) to be big enough to hold all of the characters required. 诀窍是确保预先分配 StringBuffer (实际上, StringBuilder更好!)要足够大以容纳所需的所有字符。 Then copy data into it using a charset decoder (directly or using a Reader pipeline). 然后使用字符集解码器(直接或使用读取器管道)将数据复制到其中。

But even with optimal coding, you are likely to need a peak of 3 times the size of your input byte[] . 但是,即使采用最佳编码,您也可能需要3倍于输入byte[]大小的峰值。


Note that your OOME problem is probably nothing to do with GC or storage leaks. 请注意,您的OOME问题可能与GC或存储泄漏无关。 It is actually about the fundamental space requirements of the data types you are using ... and the fact that Java does not offer a "string of bytes" data type. 实际上,这与您正在使用的数据类型的基本空间要求有关……以及Java不提供“字节字符串”数据类型的事实。

There is no such OutOfMemoryException in my apidocs. 我的apidocs中没有此类OutOfMemoryException If it's OutOfMemoryError , especially on the server-side, you definitely got a problem. 如果是OutOfMemoryError ,尤其是在服务器端,则肯定有问题。

When you receive big requests from clients, those String related statements are not the first problem. 当您收到来自客户端的big请求时,那些与String相关的语句并不是第一个问题。 Reducing 3X to 1X is not the solution. 将3倍减少到1倍不是解决方案。

I'm sorry I can't help without any further codes. 抱歉,如果没有其他代码,我无能为力。

Use back-end storage 使用后端存储

You should not store the whole request body on byte[] . 您不应将整个请求正文存储在byte[] You can store them directly on any back-end storage such as a local file, a remote database, or cloud storage. 您可以将它们直接存储在任何后端存储中,例如本地文件,远程数据库或云存储。

I would 我会

copy stream from request to back-end with small chunked buffer

Use streams 使用流

If can use Streams not Objects. 如果可以使用Streams而不是Objects。

I would 我会

response.getWriter().write("{\"Abcd\":");
copy <your back-end stored data as stream>);
response.getWriter().write("}");

Yes, if you use a Stringbuffer for the code you have, you would save 1mb of heap space in the last step. 是的,如果对您的代码使用Stringbuffer,则在最后一步中将节省1mb的堆空间。 However, considering the size of data you have, I recommend an external memory algorithm where you bring only part of your data to memory, process it and put it back to storage. 但是,考虑到您拥有的数据大小,我建议使用外部存储算法,在该算法中,您仅将部分数据存储到内存中,进行处理并将其放回存储中。

As others have mentioned, you should really try not to have such a big Object in your mobile app, and that streaming should be your best solution. 正如其他人提到的那样,您应该真正尝试在移动应用程序中不要包含这么大的对象,并且流媒体应该是您的最佳解决方案。

That said, there are some techniques to reduce the amount memory your app is using now: 也就是说,有一些技术可以减少您的应用程序正在使用的内存量:

  1. Remove byte[] responseData entirely if possible, so the memory it used can be released ASAP (assuming it is not used anywhere else) 如果可能,请完全删除byte[] responseData ,以便可以尽快释放所使用的内存(假设未在其他任何地方使用它)
  2. Create the largest String first, and then substring() it, Android uses Apache Harmony for its standard Java library implementation. 首先创建最大的String,然后创建substring() ,Android将Apache Harmony用于其标准Java库实现。 If you check its String class implementation , you'll see that substring() is implemented simply by creating a new String object with the proper start and end offset to the original data and no duplicate copy is created. 如果检查其String类的实现 ,您将看到substring()的实现仅是通过创建一个新的String对象实现的,该对象具有与原始数据正确的开始和结束偏移,并且没有创建重复的副本。 So doing the following would cuts the overall memory consumption by at least 1/3 : 因此, 执行以下操作将使整体内存消耗减少至少1/3

    String withKey = StringBuilder().append("{\\"Abcd\\").append(str).append("}").toString(); String str = withKey.substring("{\\"Abcd\\".length(), withKey.length()-"}".length()); 字符串withKey = StringBuilder()。append(“ {\\” Abcd \\“)。append(str).append(”}“)。toString();字符串str = withKey.substring(” {\\“ Abcd \\”。length (),withKey.length()-“}”。length());

  3. Never ever use something like "{\\"Abcd\\":" + str + "}" for large Strings, under the hood "string_a"+"string_b" is implemented as new StringBuilder().append("string_a").append("string_b").toString(); 切勿对大型字符串使用"{\\"Abcd\\":" + str + "}"的东西,在幕后,将“ string_a” +“ string_b”实现为新的StringBuilder().append("string_a").append("string_b").toString(); so implicitly you are creating two (or at least one if the compiler is mart) StringBuilders. 因此,隐式地,您正在创建两个(如果编译器是mart,则至少是一个)StringBuilders。 For large Strings, it's better that you take over this process yourself as you have deep domain knowledge about your program that the compiler doesn't, and knows how to best manipulate the strings. 对于大型字符串,最好是自己接管此过程,因为您对程序有深入的领域知识,而编译器没有,并且知道如何最佳地操作字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM