简体   繁体   English

检查字符串的更好方法?

[英]Better way to check a String?

I have a code that check a string for space,comma and etc. Well since I will deal a scenario where my app will going to check, lets say thousand of string with a max length of 15 and a minimum length of 14. I am worried if it will affect the performance since it is in android. 我有一个检查字符串的空格,逗号等的代码。好吧,因为我将处理我的应用程序要检查的情况,所以说一千个字符串,最大长度为15,最小长度为14。因为它在android中,所以担心它是否会影响性能。 Check the code i used.. 检查我使用的代码。

private final static char[] undefinedChars = {' ','/','.','<','>','*','!'};

    public static boolean checkMessage(String message){

    if (message == null)
        return false;

    char[] _message = message.toCharArray();

    for (char c : _message) {
         for (int i = 0;i > undefinedChars.length;i++)
                if (c == undefinedChars[i])
                    return true;
    }

    return false;
}

Is this correct? 这个对吗? or there is a way to improve it? 还是有办法改善它?

There is a change that you could make that might make a little difference: 您可以进行一些更改,但可能有所不同:

Change 更改

    char[] _message = message.toCharArray();
    for (char c : _message) {

to

    for (int i = 0; i < message.length(); i++) {
        char c = message.charAt(i);

However, I doubt that it will be significant. 但是,我怀疑这是否有意义。

Replacing the inner loop with a switch is more likely to be fruitful, though it depends on what the JIT compiler does with the code. 尽管使用JIT编译器对代码进行处理,但是用switch替换内部循环更有可能取得成果。 (And a switch will only works if the set of undefined characters can be hard-wired into the switch statement as compile-time constants.) (并且仅当可以将未定义字符集作为编译时常量硬连接到switch语句中时,switch才起作用。)


I am worried if it will affect the performance since it is in android. 我担心它是否会影响性能,因为它在android中。

Don't "worry". 不要“担心”。 Approach the problem scientifically. 科学地解决问题。

  1. Implement the code and then benchmark it. 实施代码,然后对其进行基准测试。
  2. If the measured performance is a concern, then: 如果需要衡量性能,则:
    1. profile the code 分析代码
    2. look at hotspots, and identify possible improvements 查看热点并确定可能的改进
    3. implement and test possible improvement 实施并测试可能的改进
    4. rerun the benchmark to see if the improvement actually made any difference 重新运行基准,以查看改进是否真正产生了影响
    5. repeat ... until performance is good enough or you run out of options. 重复...直到性能足够好或您用完所有选项。

The other thing to note is that the same code could well perform differently across different Android platforms. 还要注意的另一件事是,相同的代码在不同的Android平台上可能会表现出不同的性能。 The quality of JIT compilers has (apparently) improved markedly in more recent releases. 在最近的发行版中,JIT编译器的质量(显然)已得到明显改善。

I would argue that it is a bad idea to "bend" your code just to get it to run well on old phones. 我会争辩说,“弯曲”您的代码以使其在旧手机上正常运行是一个坏主意。 The chances are that the user will upgrade their hardware soon anyway ... and it is conceivable that your optimization for the old platform actually makes your code slower on a new platform ... 'cos your hand-optimizations have made the code too tricky for the JIT compiler's optimizer to deal with. 用户有可能很快就会升级他们的硬件...并且可以想象,您对旧平台的优化实际上会使您在新平台上的代码变慢...因为您的手动优化使代码过于棘手供JIT编译器的优化器处理。

This is also an argument for NOT trying to make your code go "as fast as possible" ... 这也是不尝试使您的代码“尽可能快地”运行的论点。

First of all, I see a bug there. 首先,我在那里看到一个错误。

for (int i = 0;i > undefinedChars.length;i++)

that I think you meant 我认为你的意思是

for (int i = 0;i < undefinedChars.length;i++)

instead? 代替?

Anyway it seems that your algorithm runs in O(m*n) where m is the length of message and n is the length of undefined chars(in this case fixed size, 15). 无论如何,您的算法似乎在O(m * n)中运行,其中m是消息的长度,n是未定义字符的长度(在这种情况下,固定大小为15)。 Therefore it should be efficient in run-time analysis perspective. 因此,它在运行时分析方面应该是有效的。

I would profile the scenario first then decide how to improve it, that you could've sorted the message upfront somewhere then you can only check for either 1st char or the last char of the string, but as I said, only if that's been sorted elsewhere. 我将首先分析该方案,然后决定如何对其进行改进,以使您可以预先在某处对消息进行排序,然后您只能检查字符串的第一个字符或最后一个字符,但是正如我所说的,只有在对它进行排序的情况下别处。

Or maybe think of parallelizing the routine. 或者也许考虑并行化例程。 It should be straightforward. 它应该很简单。

Without using memory, you're about as fast as you can get. 不使用内存,您将获得最快的速度。 You can trade memory for performance. 您可以以内存换取性能。 For example, you can put the characters you want to check into a HashMap. 例如,您可以将要检查的字符放入HashMap中。 Then you can loop over the string you're checking, and check if each index is in that map or not. 然后,您可以遍历正在检查的字符串,并检查每个索引是否在该映射中。 If the number of characters you want to check for is small, this will be less efficient. 如果要检查的字符数很少,则效率会降低。 If the number is big, it will be more efficient (Technically this algorithm is O(n) instead of O(n*m), but if m is small then the constants you're usually taught to ignore will matter). 如果数量很大,效率会更高(从技术上讲,该算法是O(n)而不是O(n * m),但是如果m很小,那么通常被教导要忽略的常数将很重要)。

Another way is to use an array of booleans, with each possible character in the string mapping to an index in that array. 另一种方法是使用布尔数组,字符串中的每个可能字符都映射到该数组中的索引。 Set only the characters you care about to true (and save that array). 仅将您关心的字符设置为true(并保存该数组)。 Then you can avoid the hash calculation above, but at the cost of a lot of memory. 然后,您可以避免上面的哈希计算,但是要消耗大量内存。

Really, your original algorithm is likely good enough. 确实,您的原始算法可能足够好。 But these (especially the hash map) are things you can consider if needed. 但是这些(尤其是哈希图)是您可以在需要时考虑的东西。

Try using a regular expression. 尝试使用正则表达式。 I find it very clean and it should not hurt your performance. 我发现它非常干净,不会损害您的性能。

public static boolean checkMessage(String message)
{
    if (message == null)
        return false;

    String regex = " |\\.|/|<|>|\\*|!";
    Matcher matcher = Pattern.compile(regex).matcher(message);
    if (matcher.find())
        return true;
    else
        return false;
}

For symmetry and possibly some compiler optimization, why not use a for-each style loop for both loops. 为了实现对称性并可能进行一些编译器优化,为什么不对两个循环都使用for-each样式循环。 As an additional benefit, you wouldn't risk a typo like the one pointed out by glaze. 另外一个好处是,您不会像釉料所指出的那样冒错字。 Your code would then become: 您的代码将变为:

private final static char[] undefinedChars = {' ','/','.','<','>','*','!'};

public static boolean checkMessage(String message){

if (message == null)
    return false;

char[] _message = message.toCharArray();

for (char c : _message) {
     for (for u : undefinedChars)
            if (c == u)
                return true;
}

return false;

} }

An additional optimization would be to order the characters in undefinedChars in the order most likely to occur. 另一个优化是按照最可能出现的顺序对undefinedChars中的字符进行排序。 That way you'll bail-out as quick as possible. 这样,您将尽快获得纾困。

Use a Set to hold your undefinedChars 使用Set来保存您的undefinedChars

Set<Character> undefinedChars = new HashSet<Character>(Arrays.asList(new Character(' ') ,new Character('/'),new Character('.')));

public boolean hasUndefinedChar(String str) {
  for (int i = 0; i < str.length(); i++) {
    char iChar = str.charAt(i);
    Character charWrapper = new Character(iChar);
    if (undefinedChars.contains(charWrapper)) {
      return true;
    }
  }
  return false;
}

This method is O(n) time efficient and does not sufficiently affect space complexity. 此方法的时间效率为O(n) ,并且不会充分影响空间复杂度。 The contains calls to the Set are O(1) operations and you make n of these contains calls in the worst case. Set的contains调用是O(1)操作,在最坏的情况下,要使这些包含调用中的n成为n

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM