简体   繁体   English

在Java中拆分字符串算法

[英]Splitting string algorithm in Java

I'm trying to make the following algorithm work. 我正在尝试使以下算法工作。 What I want to do is split the given string into substrings consisting of either a series of numbers or an operator. 我想要做的是将给定的字符串拆分为由一系列数字或运算符组成的子字符串。

So for this string = "22+2", I would get an array in which [0]="22" [1]="+" and [2]="2". 所以对于这个字符串=“22 + 2”,我会得到一个数组,其中[0] =“22”[1] =“+”和[2] =“2”。

This is what I have so far, but I get an index out of bounds exception: 这是我到目前为止,但我得到一个超出范围的索引例外:

public static void main(String[] args) {
    String string = "114+034556-2";
    int k,a,j;
    k=0;a=0;j=0;
    String[] subStrings= new String[string.length()];

    while(k<string.length()){
        a=k;
        while(((int)string.charAt(k))<=57&&((int)string.charAt(k))>=48){
            k++;}
        subStrings[j]=String.valueOf(string.subSequence(a,k-1)); //exception here

        j++;
        subStrings[j]=String.valueOf(string.charAt(k));
        j++;

   }}

I would rather be told what's wrong with my reasoning than be offered an alternative, but of course I will appreciate any kind of help. 我宁愿被告知我的推理有什么问题而不是提供替代方案,但我当然会感激任何帮助。

I'm deliberately not answering this question directly, because it looks like you're trying to figure out a solution yourself. 我故意不直接回答这个问题,因为看起来你正在试图找出一个解决方案。 I'm also assuming that you're purposefully not using the split or the indexOf functions, which would make this pretty trivial. 我还假设你故意不使用split或indexOf函数,这将使这非常简单。

A few things I've noticed: 我注意到的一些事情:

  1. If your input string is long, you'd probably be better off working with a char array and stringbuilder, so you can avoid memory problems arising from immutable strings 如果输入字符串很长,那么使用char数组和stringbuilder可能会更好,这样可以避免因不可变字符串引起的内存问题
  2. Have you tried catching the exception, or printing out what the value of k is that causes your index out of bounds problem? 您是否尝试捕获异常,或打印出k的值导致索引超出范围的问题?
  3. Have you thought through what happens when your string terminates? 你有没有想过当你的字符串终止时会发生什么? For instance, have you run this through a debugger when the input string is "454" or something similarly trivial? 例如,当输入字符串为“454”或类似的微不足道时,您是否通过调试器运行此操作?

You could use a regular expression to split the numbers from the operators using lookahead and lookbehind assertions 您可以使用正则表达式使用lookahead和lookbehind断言从运算符中分割数字

String equation = "22+2";
String[] tmp = equation.split("(?=[+\\-/])|(?<=[+\\-/])");
System.out.println(Arrays.toString(tmp));

If your critera is simply "Anything that is not a number", then you can use some simple regex stuff if you dont mind working with parallel arrays - 如果你的critera只是“任何不是数字的东西”,那么你可以使用一些简单的正则表达式,如果你不介意使用并行数组 -

String[] operands = string.split("\\D");\\split around anything that is NOT a number
char[] operators = string.replaceAll("\\d", "").toCharArray();\\replace all numbers with "" and turn into char array.
String input="22+2-3*212/21+23";
     String number="";
     String op="";
     List<String> numbers=new ArrayList<String>();
     List<String> operators=new ArrayList<String>();
     for(int i=0;i<input.length();i++){
         char c=input.charAt(i);
         if(i==input.length()-1){
             number+=String.valueOf(c);
             numbers.add(number);
         }else if(Character.isDigit(c)){
             number+=String.valueOf(c);
         }else{
              if(c=='+' || c=='-' || c=='*' ||c=='/'){
             op=String.valueOf(c);
             operators.add(op);
             numbers.add(number);
             op="";
             number="";
             }
         }
     }
     for(String x:numbers){
         System.out.println("number="+x+",");
     }
     for(String x:operators){
         System.out.println("operators="+x+",");
     }

this will be the output number=22,number=2,number=3,number=212,number=21,number=23,operator=+,operator=-,operator=*,operator=/,operator=+, 这将是输出数字= 22,数字= 2,数字= 3,数字= 212,数字= 21,数字= 23,运算符= +,运算符= - ,运算符= *,运算符= /,运算符= +,

If you're interested in the general problem of parsing, then I'd recommend thinking about it on a character-by-character level, and moving through a finite state machine with each new character. 如果你对解析的一般问题感兴趣,那么我建议在逐个字符级别上思考它,并在每个新角色的有限状态机中移动。 (Often you'll need a terminator character that cannot occur in the input--such as the \\0 in C strings--but we can get around that.). (通常你需要一个不能在输入中出现的终结符 - 比如C字符串中的\\ 0 - 但我们可以绕过它。)。

In this case, you might have the following states: 在这种情况下,您可能具有以下状态:

  1. initial state 初始状态
  2. just parsed a number. 只是解析了一个数字。
  3. just parsed an operator. 刚刚解析了一个运算符。

The characters determine the transitions from state to state: 字符决定了从州到州的转换:

  • You start in state 1. 你从州1开始。
  • Numbers transition into state 2. 数字转换为状态2。
  • Operators transition into state 3. 运营商过渡到州3。

The current state can be tracked with something like an enum , changing the state after each character is consumed. 可以使用enum等方式跟踪当前状态,在消耗每个字符后更改状态。

With that setup, then you just need to loop over the input string and switch on the current state. 使用该设置,您只需循环输入字符串并打开当前状态。

// this is pseudocode -- does not compile.
List<String> parse(String inputString) {
    State state = INIT_STATE;
    String curr = "";
    List<String> subStrs = new ArrayList<String>();
    for(Char c : inputString) {
      State next;
      if (isAnumber(c)) {
        next = JUST_NUM;
      } else {
        next = JUST_OP;
      }

      if (state == next) {
        // no state change, just add to accumulator:
        acc = acc + c;
      } else {
        // state change, so save and reset the accumulator:
        subStrs.add(acc);
        acc = "";
      }
      // update the state
      state = next;
    }
    return subStrs;
}

With a structure like that, you can more easily add new features / constructs by adding new states and updating the behavior depending on the current state and incoming character. 使用这样的结构,您可以通过添加新状态并根据当前状态和传入字符更新行为,更轻松地添加新功能/构造。 For example, you could add a check to throw errors if letters appear in the string (and include offset locations, if you wanted to track that). 例如,如果字母出现在字符串中,您可以添加一个检查来抛出错误(如果您想跟踪它,则包括偏移位置)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM