[英]Splitting string algorithm in Java
I'm trying to make the following algorithm work. 我正在尝试使以下算法工作。 What I want to do is split the given string into substrings consisting of either a series of numbers or an operator.
我想要做的是将给定的字符串拆分为由一系列数字或运算符组成的子字符串。
So for this string = "22+2", I would get an array in which [0]="22" [1]="+" and [2]="2". 所以对于这个字符串=“22 + 2”,我会得到一个数组,其中[0] =“22”[1] =“+”和[2] =“2”。
This is what I have so far, but I get an index out of bounds exception: 这是我到目前为止,但我得到一个超出范围的索引例外:
public static void main(String[] args) {
String string = "114+034556-2";
int k,a,j;
k=0;a=0;j=0;
String[] subStrings= new String[string.length()];
while(k<string.length()){
a=k;
while(((int)string.charAt(k))<=57&&((int)string.charAt(k))>=48){
k++;}
subStrings[j]=String.valueOf(string.subSequence(a,k-1)); //exception here
j++;
subStrings[j]=String.valueOf(string.charAt(k));
j++;
}}
I would rather be told what's wrong with my reasoning than be offered an alternative, but of course I will appreciate any kind of help. 我宁愿被告知我的推理有什么问题而不是提供替代方案,但我当然会感激任何帮助。
I'm deliberately not answering this question directly, because it looks like you're trying to figure out a solution yourself. 我故意不直接回答这个问题,因为看起来你正在试图找出一个解决方案。 I'm also assuming that you're purposefully not using the split or the indexOf functions, which would make this pretty trivial.
我还假设你故意不使用split或indexOf函数,这将使这非常简单。
A few things I've noticed: 我注意到的一些事情:
You could use a regular expression to split the numbers from the operators using lookahead and lookbehind assertions 您可以使用正则表达式使用lookahead和lookbehind断言从运算符中分割数字
String equation = "22+2";
String[] tmp = equation.split("(?=[+\\-/])|(?<=[+\\-/])");
System.out.println(Arrays.toString(tmp));
If your critera is simply "Anything that is not a number", then you can use some simple regex stuff if you dont mind working with parallel arrays - 如果你的critera只是“任何不是数字的东西”,那么你可以使用一些简单的正则表达式,如果你不介意使用并行数组 -
String[] operands = string.split("\\D");\\split around anything that is NOT a number
char[] operators = string.replaceAll("\\d", "").toCharArray();\\replace all numbers with "" and turn into char array.
String input="22+2-3*212/21+23";
String number="";
String op="";
List<String> numbers=new ArrayList<String>();
List<String> operators=new ArrayList<String>();
for(int i=0;i<input.length();i++){
char c=input.charAt(i);
if(i==input.length()-1){
number+=String.valueOf(c);
numbers.add(number);
}else if(Character.isDigit(c)){
number+=String.valueOf(c);
}else{
if(c=='+' || c=='-' || c=='*' ||c=='/'){
op=String.valueOf(c);
operators.add(op);
numbers.add(number);
op="";
number="";
}
}
}
for(String x:numbers){
System.out.println("number="+x+",");
}
for(String x:operators){
System.out.println("operators="+x+",");
}
this will be the output number=22,number=2,number=3,number=212,number=21,number=23,operator=+,operator=-,operator=*,operator=/,operator=+, 这将是输出数字= 22,数字= 2,数字= 3,数字= 212,数字= 21,数字= 23,运算符= +,运算符= - ,运算符= *,运算符= /,运算符= +,
If you're interested in the general problem of parsing, then I'd recommend thinking about it on a character-by-character level, and moving through a finite state machine with each new character. 如果你对解析的一般问题感兴趣,那么我建议在逐个字符级别上思考它,并在每个新角色的有限状态机中移动。 (Often you'll need a terminator character that cannot occur in the input--such as the \\0 in C strings--but we can get around that.).
(通常你需要一个不能在输入中出现的终结符 - 比如C字符串中的\\ 0 - 但我们可以绕过它。)。
In this case, you might have the following states: 在这种情况下,您可能具有以下状态:
The characters determine the transitions from state to state: 字符决定了从州到州的转换:
The current state can be tracked with something like an enum
, changing the state after each character is consumed. 可以使用
enum
等方式跟踪当前状态,在消耗每个字符后更改状态。
With that setup, then you just need to loop over the input string and switch on the current state. 使用该设置,您只需循环输入字符串并打开当前状态。
// this is pseudocode -- does not compile.
List<String> parse(String inputString) {
State state = INIT_STATE;
String curr = "";
List<String> subStrs = new ArrayList<String>();
for(Char c : inputString) {
State next;
if (isAnumber(c)) {
next = JUST_NUM;
} else {
next = JUST_OP;
}
if (state == next) {
// no state change, just add to accumulator:
acc = acc + c;
} else {
// state change, so save and reset the accumulator:
subStrs.add(acc);
acc = "";
}
// update the state
state = next;
}
return subStrs;
}
With a structure like that, you can more easily add new features / constructs by adding new states and updating the behavior depending on the current state and incoming character. 使用这样的结构,您可以通过添加新状态并根据当前状态和传入字符更新行为,更轻松地添加新功能/构造。 For example, you could add a check to throw errors if letters appear in the string (and include offset locations, if you wanted to track that).
例如,如果字母出现在字符串中,您可以添加一个检查来抛出错误(如果您想跟踪它,则包括偏移位置)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.