简体   繁体   English

C词法分析器。 使用开关来分析和计算小数/非小数

[英]C lexical analyzer. Using switch to analyze and count a decimal/not decimal

My lexical analyzer recognizes digits(5,555,543667), decimals(44.65,4.1), and periods(.). 我的词法分析器识别数字(5,555,543667),小数(44.65,4.1)和句点(。)。

I can count digits, decimals, and periods fine but when I come across a digit and period next to each other it counts it as a decimal. 我可以计算数字,小数和句点,但是当我遇到一个数字和一个彼此相邻的句点时,它将它计为小数。

Consider a text file that contains: 555 2.3 55.23 44 5. 考虑一个包含以下内容的文本文件:555 2.3 55.23 44 5。

My output would be 我的输出是

1 type 1: 555 1型1:555
2 type 3: 2.3 2型3:2.3
3 type 3: 55.23 3型3:55.23
4 type 1: 44 4型1:44
5 type 3: 5. 5型3:5。

Where type 3 is my identifier for a decimal. 其中类型3是我的小数点标识符。

I would want the 5th and 6th tokens to be counted as a digit and then a period. 我希望将第5和第6个令牌计为数字,然后计算一段时间。

Here is how I am handling my switch statement. 以下是我处理switch语句的方法。

  switch(*b) {

    case '0':
    case '1':
    case '2':
    case '3':
    case '4':
    case '5':
    case '6':
    case '7':
    case '8':
    case '9':
    digits:
        t.length++;
        switch(*(b + t.length)) {
            case '0':
            case '1':
            case '2':
            case '3':
            case '4':
            case '5':
            case '6':
            case '7':
            case '8':
            case '9':
                goto digits;
            case '.': 
                goto decimal;                   
                break;
            default:
                break;
        }

         t.type = TOKEN_DIGITS;
        t.string = (char *)calloc(t.length + 1, sizeof(char));
        strncpy(t.string, b, t.length);
        break;



    decimal:
        t.length++;
        switch(*(b + t.length)) {
            case '0':
            case '1':
            case '2':
            case '3':
            case '4':
            case '5':
            case '6':
            case '7':
            case '8':
            case '9':
                goto decimal;
                break;
            }   
            t.type = TOKEN_DECIMAL;
            t.string = (char *)calloc(t.length+1,sizeof(char));
            strncpy(t.string,b,t.length);           
       break;

Tried multiple things but I am officially stuck. 尝试了多件事但我正式陷入困境。

You really should be using character classification functions for this kind of excercise instead of long switch statements. 你真的应该使用字符分类函数来代替long switch语句。 Your code will be a lot simpler and you won't have to use goto at all. 您的代码将更加简单,您根本不必使用goto

For example, a number could be described with the following regular expression (added whitespace to break up the various blocks): 例如,可以使用以下正则表达式来描述数字(添加空格以分解各个块):

[-+]? [0-9]* \.? [0-9]+

This already shows the possible state transitions: 这已经显示了可能的状态转换:

  • A number can (optionally) start with + or - (if you support signed numbers) 数字可以(可选)以+-开头(如果您支持有符号数字)
  • It may have 0..n digits 它可能有0..n个数字
  • If the following character is not a decimal point symbol, it should be a separator, otherwise it's an invalid symbol. 如果以下字符不是小数点符号,则它应该是分隔符,否则它是无效符号。 If it's a separator, your number is terminated. 如果是分隔符,则终止您的号码。
  • After a decimal point there should be 1..n digits 小数点后应该有1..n位数
  • The number is terminated when you reach the end of input or you encounter a separator 当您到达输入结尾或遇到分隔符时,数字将终止

All this can be done in a handful of lines of code - just have a pointer that points to your current input character and then keep stepping forward one by one and examine each character and based on the character class, decide what to do. 所有这些都可以在少量代码行中完成 - 只需要一个指向当前输入字符的指针,然后逐个前进并检查每个字符并根据字符类决定要做什么。

Now, this particular approach doesn't handle floating point numbers using scientific notation, etc. but adding thos extras is really simple once you have the basics done. 现在,这种特殊方法不使用科学记数法等处理浮点数,但是一旦完成基本操作,添加额外内容就非常简单。

I think this complements xxbbcc's answer. 我认为这补充了xxbbcc的答案。

* Very roughly * something like this. * 非常粗略*类似这样的东西。

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

yylex() {
        int c;
        char *p, buf[1000];

        for(c = get(); isspace(c); c = get());

        if(isdigit(c)) {
                p = buf;
                while(isdigit(c)) {
                        *p++ = c;
                        c = get();

                }
                *p = 0;
                if(c != '.') {
                        unget(c);
                        int i = atoi(buf);
                        return INT;
                }
                assert(c == '.');
                *p++ = c;
                c = get();
                while(isdigit(c)) {
                        *p++ = c;
                        c = get();
                }
                *p = 0;
                float f = atof(buf);
                unget(c);
                return DECIMAL;
        }
}

There's a lot of details left unsaid. 还有很多细节没有说明。 Watching for EOF. 看着EOF。 Buffer overflow. 缓冲区溢出。 Setting yylval to the int or float. 将yylval设置为int或float。 Parsing tokens other than simple numbers. 解析除简单数字之外的令牌。

Use a variable like digit_follow_peroid to keep the state. 使用像digit_follow_peroid这样的变量来保持状态。 Every time you encounter a peroid, set the variable to false and then when you encounter a digit in the decimal switch block, set it to true. 每次遇到peroid时,将变量设置为false,然后在十进制开关块中遇到一个数字时,将其设置为true。 check the variable's value to determine the t.lengthbefore strncpy . 检查变量的值以确定strncpy的t.length。 Maybe you also need other variables to work together with it. 也许你还需要其他变量来与它一起工作。 The best way is to define a state-transition matrix, which is much better than gotos. 最好的方法是定义一个状态转换矩阵,它比gotos好得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM