简体   繁体   English

Java优化字符串与字符数组

[英]Java Optimization String vs Char Arrays

In a program I am writing I am doing a lot of string manipulation. 在我正在编写的程序中,我正在进行大量的字符串操作。 I am trying to increase performance and am wondering if using char arrays would show a decent performance increase. 我正在努力提高性能,我想知道使用char数组是否会显示出不错的性能提升。 Any suggestions? 有什么建议?

What kind of manipulation are you doing? 你在做什么样的操纵? Can you post a code sample? 你可以发一个代码示例吗?

You may want to take a look at StringBuilder which implements CharSequence to improve performance. 您可能需要查看实现CharSequence的 StringBuilder以提高性能。 I'm not sure you want to roll your own. 我不确定你想要自己动手。 StringBuilder isn't thread safe btw... if you want thread safety look at StringBuffer . StringBuilder不是线程安全btw ...如果你想要线程安全看看StringBuffer

String is already implemented as a char array. String已经实现为char数组。 What are you planning to do differently? 你有什么计划以不同的方式做什么? Anyway, between that and the fact that GC for ephemeral objects is extremely fast I would be amazed if you could find a way to increase performance by substituting char arrays. 无论如何,在那个和短暂对象的GC非常快的事实之间,如果你能找到一种通过替换char数组来提高性能的方法,我会感到惊讶。

Michael Borgwardt's advice about small char arrays and using StringBuilder and StringBuffer is very good. Michael Borgwardt关于小字符数组和使用StringBuilder和StringBuffer的建议非常好。 But to me the main thing is to try not to guess about what's slow: make measurements, use a profiler, get some definite facts. 但对我来说,主要的是尽量不要猜测什么是缓慢的:进行测量,使用剖析器,获得一些明确的事实。 Because usually our guesses about performance turn out to be wrong. 因为通常我们对性能的猜测结果是错误的。

Here is an excerpt from the full source of String class from JDK 6.0: 以下是JDK 6.0 中String类完整源代码的摘录:

 public final class String implements  java.io.Serializable,
       Comparable<String>, CharSequence {
       /** The value is used for character storage. */
        private final char value[];

       /** The offset is the first index of the storage that is used. */
       private final int offset;

        /** The count is the number of characters in the String. */
       private final int count;

As you can see internally the value is already stored as an array of chars. 正如您在内部看到的那样,该值已经存储为一个字符数组。 An array of chars as a data structure has all the limitations of the String class for most string manipulations: Java arrays do not grow, ie every time (ok, may be not every single time) your string would need to grow you'd need to allocate a new array and copy the contents. 作为数据结构的字符数组具有String类的所有限制,适用于大多数字符串操作:Java数组不会增长,即每次(确定,可能不是每一次)您的字符串都需要增长,您需要分配新数组并复制内容。

As suggested earlier it makes sense to use StringBuilder or StringBuffer for most string manipulations. 如前所述,使用StringBuilderStringBuffer进行大多数字符串操作是有意义的。

In fact the following code: 实际上以下代码:

   String a = "a";
   a=a+"b";
   a=a+"c";

When compiled will be automatically converted to use StringBuilder, this can be easily checked with the help of javap . 编译后会自动转换为使用StringBuilder,这可以在javap的帮助下轻松检查。

As a rule of thumb it's rarely advisable to spend time trying to improve performance of the core Java classes, unless you're a world class expert on the matter, simply because this code was written by the world class experts in the first place. 根据经验,很少有人花时间尝试提高核心Java类的性能,除非你是这方面的世界级专家,因为这个代码首先是由世界级专家编写的。

Have you profiled your application? 你有没有想过你的申请? Do you know where the bottlenecks are? 你知道瓶颈在哪里吗? That is the first step if the performance is sub par. 如果性能低于标准,这是第一步。 Well, that and defining what acceptable performance metrics are. 好吧,那就是定义可接受的性能指标。

Once you have profiled doing some tasks, you will have percentages of time spent doing things. 一旦你完成了一些任务的分析,你将花费百分比的时间做事。 If you are spending a lot of time manipulating Strings, maybe you can start to cache some of those manipulations? 如果你花了很多时间操纵字符串,也许你可以开始缓存一些操作? Are you doing some of them repeatedly when doing them only once would suffice (and then use that result again later when it is needed)? 当你只做一次就足够时,你是否反复做一些(然后在需要时再次使用该结果)? Are you copying Strings when you don't need to? 你不需要时复制字符串吗? Remember, java.lang.String is immutable - so it cannot be changed directly. 请记住,java.lang.String是不可变的 - 因此无法直接更改。

I have found several times while optimizing/performance tweaking systems I work on that I do not know where the slowness comes from instinctively. 我在优化/性能调整系统时发现了几次我不知道慢慢来自本能的地方。 I have seen others (and, shamefully, myself) spend days optimizing something that shows no gain - because it was not the original bottleneck, and was in fact less than 1% of the time spent. 我已经看到其他人(而且,可耻的是,我自己)花了几天时间来优化那些没有收获的东西 - 因为它不是最初的瓶颈,实际上花费的时间不到1%。

Hope this helps point you in the right direction. 希望这有助于指明您正确的方向。

When you have a very large number of short Strings, using char[] instead can save quite a bit of memory, which also means more speed due to less cache misses. 当你有很多短字符串时,使用char[]可以节省相当多的内存,这也意味着更少的缓存未命中速度。

But with large Strings, the main thing to look out for is avoiding unnecessary copying resulting fom the immutability of String . 但是对于大字符串,要注意的主要是避免不必要的复制,从而导致String的不变性。 If you do a lot of concatenating or replacing, using StringBuilder can make a big difference. 如果你做了很多连接或替换,使用StringBuilder可以产生很大的不同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM