简体   繁体   English

将 void 指针(它是结构的一部分)转换为另一种指针数据类型

[英]Casting a void pointer (that is part of a struct) into another pointer data type

I'm trying to figure out how to parse S-expressions in C on my own, in order to store data and code for my own rudimentary Lisp (written as a learning exercise, not for production).我试图弄清楚如何自己解析 C 中的 S 表达式,以便为我自己的基本 Lisp 存储数据和代码(作为学习练习编写,而不是用于生产)。

Before explaining my code and my reasoning, I should explain that all I know about S-expressions is the introductory section of the Wikipedia article on it, and the occasional glance at Common Lisp code, so the naming of my structs and variables may be a bit off.在解释我的代码和我的推理之前,我应该解释一下,我对 S-expressions 的所有了解都是关于它的 Wikipedia 文章的介绍部分,以及偶尔浏览一下 Common Lisp 代码,所以我的结构和变量的命名可能是一个有点掉。

My language of implementation is C, and before I defined any functions I created the following structs:我的实现语言是 C,在定义任何函数之前,我创建了以下结构:

typedef enum {
    string,
    letter,
    integer,
} atom_type;

typedef struct {
    void* blob;
    atom_type type;
} atom;

typedef struct expr {
    atom* current;
    struct expr* next;
} expr;

Each atom is stored in a struct atom , which contains a enum instance (? I'm not sure of the correct jargon for this) and a void pointer pointing to the data to be stored.每个原子都存储在一个 struct atom中,其中包含一个枚举实例(?我不确定这方面的正确术语)和一个指向要存储的数据的 void 指针。 Each S-expression "node" consists of a pointer to an atom and a pointer to the next S-expression node.每个 S 表达式“节点”由一个指向原子的指针和一个指向下一个 S 表达式节点的指针组成。

I've written a rudimentary function that accepts a string and parses it into an atom, like the following:我编写了一个基本的 function 接受字符串并将其解析为原子,如下所示:

atom* parse_term(char* str) {
    size_t len = strlen(str);
    atom* current = malloc(sizeof(atom));
    
    if(str[0] == '\'') {
        current->blob = (char*) &str[1];
        current->type = letter;
    } else if(str[0] == '\"') {
        char temp[256];
        int pos = 1;

        while(str[pos] != '\"') {
            temp[pos] = str[pos];
            pos++;
        }
        current->blob = malloc(256 * sizeof(char));
        current->blob = (char*) &temp;
        current->type = string;
    } else if(isdigit(str[0])){
        char temp[256];
        int pos = 0;

        while(str[pos] != ' ') {
            temp[pos] = str[pos];
            pos++;
        }
        int tmp = atoi(temp);
        current->blob = (int*) &tmp;
        current->type = integer;
    }
    return current;
}

The function seems to be working correctly; function似乎工作正常; at least, when I print out the data type it shows it correctly.至少,当我打印出数据类型时,它会正确显示。 But apart from this I can't figure out how to print out the actual 'blob': I've tried using the %p formatting code, as well as a switch statement:但除此之外,我不知道如何打印出实际的“blob”:我尝试使用 %p 格式化代码以及 switch 语句:

void print_atom(atom* current) {
    switch(current->type) {
        case string:
            printf("atom%s\ttype:%d", current->blob, current->type);
        case letter:
            printf("atom%c\ttype:%d", current->blob, current->type);
        case integer:
            printf("atom%c\ttype:%d", current->blob, current->type);
    }
}

But this doesn't work.但这不起作用。 In the case of a string, it returns garbled text and in the case of everything else, it just doesn't print anything where the atom's information is supposed to be.在字符串的情况下,它返回乱码文本,而在其他所有情况下,它只是不打印原子信息应该在的任何内容。

I imagine this is a product of my use of a void* within a struct;我想这是我在结构中使用 void* 的产物; how could I remedy this?我该如何补救? I think I did cast properly (though I could very well be wrong, please tell me), the only other option I could concieve of is storing a hardcoded variable for every supported data type in the 'atom' struct, but this seems wasteful of resources.我认为我确实正确地转换了(尽管我很可能是错的,请告诉我),我能想到的唯一其他选择是在“原子”结构中为每种支持的数据类型存储一个硬编码变量,但这似乎是浪费资源。

Don't use void* .不要使用void* Use a union .使用union That's what union s are for.这就是union的用途。

In this example, I use an "anonymous union", which means that I can just refer to its fields as though they were directly inside the Atom struct.在这个例子中,我使用了一个“匿名联合”,这意味着我可以直接引用它的字段,就好像它们直接在 Atom 结构中一样。 (I changed the spelling of names according to my prejudices, so that types are Capitalised and constants are ALLCAPS. I also separated the typedef and struct declarations for Atom, in case Atom turns out to be self-referential. (我根据自己的偏见更改了名称的拼写,因此类型是大写的,常量是全大写的。我还分离了 Atom 的 typedef 和 struct 声明,以防 Atom 是自引用的。

typedef enum {
    STRING,
    LETTER,
    INTEGER
} AtomType;

typedef struct Atom Atom;
struct Atom {
    union {
      char* str;
      char  let;
      int   num;
    };
    AtomType type;
};

void print_atom(Atom* current) {
    switch(current->type) {
        case STRING:
            printf("atom %s\ttype:%d", current->str, current->type);
        case LETTER:
            printf("atom %c\ttype:%d", current->let, current->tyoe);
        case INTEGER:
            printf("atom %d\ttype:%d", current->num, current->type);
    }
}

As someone says in a comment, that's not actually how Lisp objects look.正如有人在评论中所说,这实际上并不是 Lisp 对象的外观。 The usual implementation is combine cons cells and atoms, something like this ( instead of AtomType ).通常的实现是结合 cons 单元和原子,像这样(而不是AtomType )。 You'll also need to add CELL to your enum.您还需要将CELL添加到您的枚举中。

typedef struct Cell Cell;
struct Cell {
    union {
        char* str;
        char  let;
        int   num;
        struct {
            Cell* hd; // Historic name: car
            Cell* tl; // Historic name: cdr
        };
    };
    CellType type;
};

Here there's an anonymous struct inside an anonymous union.这里有一个匿名联合内部的匿名结构。 Some people say this is confusing.有人说这很混乱。 Others (me, anyway) say it's less syntactic noise.其他人(无论如何,我)说它的句法噪音较少。 Use your own judgement.用你自己的判断。

The use of Cell* inside the definition of Cell is the motivation for typedef struct Cell Cell .在 Cell 的定义中使用Cell Cell*typedef struct Cell Cell的动机。

You can play not-entirely-portable-but-usually-ok games to reduce the memory consumption of Cell , and most real implementations do.您可以玩并非完全便携但通常可以的游戏,以减少Cell的 memory 消耗,大多数实际实现都可以。 I didn't, because this is a learning experience.我没有,因为这是一次学习经历。


Also note that real Lisps (and many toy ones) effectively avoid most parsing tasks;另请注意,真正的 Lisps(以及许多玩具)有效地避免了大多数解析任务; the language includes character macros which effectively do what parsing is needed (which isn't much);该语言包括字符宏,它们可以有效地进行所需的解析(这并不多); for the most part, they can be implemented in Lisp itself (although you need some way to bootstrap).在大多数情况下,它们可以在 Lisp 本身中实现(尽管您需要某种方式来引导)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM