简体   繁体   中英

How do I extract tokens from string?

I'm trying to make a 'Compiler' and I need to get the comparison operators like:

  • '<='
  • '>='
  • '!='

from an input string.

I'm trying to tokenize my input string in order to get the operators but instead I get the two outputs eg:

'<' and '=' or '>' and '='

static List<String> divideSymbols(string token){
            /** quitamos punto y coma y tokenizamos separando por operadores*/
            List<String> myTokens = new List<string>();
            // separar operadores
            char [] tokens = token.ToCharArray();
            String accum="";
            String accum1 = "";

            for(int i=0;i<tokens.Length; i++){
                try{
                    if((tokens[i]!='>' && tokens[i]!='<' && tokens[i]!='=' && tokens[i]!='+' && tokens[i]!='-' && tokens[i]!='(' && tokens[i]!=')' && tokens[i]!='{' && tokens[i]!='}' && tokens[i]!=(char)34 && tokens[i]!=(char)39 && tokens[i]!='/' && tokens[i]!='*' && tokens[i]!='%' && tokens[i]!='&' && tokens[i]!='|'  && tokens[i]!='!' && tokens[i]!=','  && tokens[i]!='[' && tokens[i]!=']' /*&& tokens[i+1] !='='*/) ){
                        /*if((tokens[i] == '>' || tokens[i] == '<' || tokens[i]== '!') && tokens[i+1]== '=' ){
                            Console.WriteLine("TEST");
                        }*/

                    if(tokens[i] == '<' && tokens[i+1] == '='){

                        
                    }
                    if(tokens[i]!=';' ){ // quitar ; (punto y coma)
                        removeDuplicates(accum);
                        
                        accum+=tokens[i];
                        
                    }

                }else{
                   
                    if(accum!=""){
                        myTokens.Add(accum);
                        myTokens.Add(tokens[i].ToString());

                    }else{
                        removeDuplicates(accum);
                        

                        myTokens.Add(tokens[i].ToString());
                    }
                    accum="";
                }
                
                    if((tokens[i]== '>' ||tokens[i]== '<' || tokens[i]== '!' || tokens[i]== '=') && tokens[i+1] == '='){
                        accum1 = tokens[i].ToString()+tokens[i+1].ToString();
                        myTokens.Add(accum1);
                        i++;
                        //myTokens.Remove('<');
                    }

                }catch(IndexOutOfRangeException){

                }   
            }
            myTokens.Add(accum);
            myTokens.Add(accum1);
            
            return myTokens;
        }

When I get the output I get both of the tokens and I need to delete the first one if the second one is a = sign.

The expected output is :

1,<if_stmt>,if
1,<open_parents>,(
1,<number>,4
1,<morethan_op>,>
1,<eqmorethan_op>,>=
1,<number>,5
1,<close_parents>,)
1,<open_braces>,{
1,<eqmorethan_op>,>=
2,<print>,print
2,<open_parents>,(
2,<string_op>,"
2,<variable>,yes
2,<string_op>,"
2,<close_parents>,)
3,<close_braces>,}
4,<class>,class
4,<variable>,Foo
4,<open_braces>,{
5,<type>,int
5,<variable>,key
6,<close_braces>,}
8,<variable>,Foo
8,<variable>,a

but without repeating the > and >=.

If I understand your problem correctly - what we need to do is extract the comparison operators from an input string.

So the idea here is to traverse the string and concentrate only on

  • '<='
  • '>='
  • '!='

As you trying to simulate a compiler what we know is that each line must end with a semi-column (punto y coma);

With that in mind what we do is pass the string, remove all whitespaces, analyse and return the found comparison operators.

    static string RemoveWhitespace( string input)
    {
        int j = 0, inputlen = input.Length;
        char[] newarr = new char[inputlen];

        for (int i = 0; i < inputlen; ++i)
        {
            char tmp = input[i];

            if (!char.IsWhiteSpace(tmp))
            {
                newarr[j] = tmp;
                ++j;
            }
        }
        return new String(newarr, 0, j);
    }

    static List<String> DivideSymbols(string tokenisedString)
    {
        string token= RemoveWhitespace(tokenisedString);

        List<String> myTokens = new List<string>();
        List<char> tokensToSkip = new List<char> { '+', '-', '(', ')', '{', '}', '/', '*', '%', '&', '|', '!', ',', '[', ']', '\'', '"' };

        char different = '!';
        char lessThan = '<';
        char greaterThan = '>';
        char equal = '=';
        char endOfLine = ';';

        for (int i = 0; i < token.Length - 1; i++)
        {
            if (token[i] == endOfLine)
            {
                break;
            }

            if (token[i] == different && token[i + 1] == equal)
            {
                myTokens.Add(token[i].ToString() + token[i + 1].ToString());
            }

            if (token[i] == lessThan && token[i + 1] == equal)
            {
                myTokens.Add(token[i].ToString() + token[i + 1].ToString());
            }

            if (token[i] == greaterThan && token[i + 1] == equal)
            {
                myTokens.Add(token[i].ToString() + token[i + 1].ToString());
            }
        }

        return myTokens;
    }
    static void Main(string[] args)
    {
        DivideSymbols("if(x == 1 && x >= 10 || v != 3 'perhaps something like' AND SQL <::=  c<=d) { x++};");
    }

Now you have the logic in place and can do whatever you like with them - within the if logic.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM