简体   繁体   English

如何提高Java的正则表达式性能

[英]How to improve the regex performance in java

I have this code to convert the whole text that is before "=" to uppercase 我有这段代码将“ =”之前的整个文本转换为大写

Matcher m = Pattern.compile("((?:^|\n).*?=)").matcher(conteudo);
while (m.find()) {
  conteudo = conteudo.replaceFirst(m.group(1), m.group(1).toUpperCase());
}

But when the string is too large, it becomes very slow, I want to find a faster way to do that. 但是,当字符串太大时,它会变得很慢,我想找到一种更快的方法。

Any sugestions? 有任何建议吗?

EDIT 编辑

I haven't explained right. 我没有解释正确。 I have a text like this 我有这样的文字

field=value
field2=value2
field3=value3

And I want to convert each line like this 我想这样转换每一行

FIELD=value
FIELD2=value2
FIELD3=value3

The fastest way to get regex to work fast is to not use regex. 使正则表达式快速运行的最快方法是不使用正则表达式。 Regex was never meant to be and almost never is a good choice for performance-sensitive operations. 对于性能敏感的操作,Regex从来都不是,而且几乎从来都不是一个好的选择。 (Further reading: Why are regular expressions so controversial? ) (进一步阅读: 为什么正则表达式这么有争议?

Try using String class methods instead, or write a custom method doing what you want. 尝试改用String类方法,或编写所需的自定义方法。 Use a tokenizer with split on '=', and then use .toUpperCase() on the tailing part (what's after \\n ). 使用在'='上分割的标记生成器,然后在.toUpperCase()使用.toUpperCase() (在\\n )。 Alternatively, just convert to char[] or use charAt() and traverse it manually, switching chars to upper after a newline and back to regular way after '='. 或者,只需将其转换为char[]或使用charAt()并手动遍历它,就可以在换行符之后将chars切换为upper,并在'='之后将其切换为常规方式。

For example: 例如:

public static String changeCase( String s ) {
    boolean capitalize = true;
    int len = s.length();
    char[] output = new char[len];
    for( int i = 0; i < len; i++ ) {
      char input = s.charAt(i);
      if ( input == '\n' ) {
        capitalize = true;
        output[i] = input;
      } else if ( input == '=' ) {
        capitalize = false;
        output[i] = input;
      } else {
        output[i] = capitalize ? Character.toUpperCase(input) : input;
      }
    }
    return new String(output);
}

Method input: 方法输入:

field=value\n
field2=value2\n
field3=value3

Method output: 方法输出:

FIELD=value\n
FIELD2=value2\n
FIELD3=value3

Try it here: http://ideone.com/k0p67j 在这里尝试: http : //ideone.com/k0p67j

PS (by Jamie Zawinski): PS(杰米·扎温斯基着):

Some people, when confronted with a problem, think "I know, I'll use regular expressions." 有些人在遇到问题时会认为“我知道,我会使用正则表达式”。 Now they have two problems. 现在他们有两个问题。

With a multiline regex we can simply get every line separately and replace it :) 使用多行正则表达式,我们可以简单地单独获取每行并替换它:)

String conteudo = "field=value\nfield2=value2\nfield3=value3";
Pattern pattern = Pattern.compile("^([^=]+=)(.*)$", Pattern.MULTILINE);
Matcher matcher = pattern.matcher(conteudo);
StringBuffer result = new StringBuffer();

while (matcher.find()) {
    matcher.appendReplacement(result, matcher.group(1).toUpperCase() + matcher.group(2));
}
System.out.println(conteudo);
System.out.println(result.toString());

What about something like this? 那这样的东西呢? indexOf should be fast enough. indexOf应该足够快。

int equalsIdx = conteudo.indexOf('=');
String result = conteudo.substring(0, equalsIdx).toUpperCase() + conteudo.substring(equalsIdx, conteudo.length());
((?:^|\n)[^=]*=)

尝试这个 。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM