简体   繁体   English

相等的字符串产生不同的哈希索引

[英]Equal strings produces different hash index

I have a program here that does replicate a memory filesystem (not finished yet), it has to read from a file its commands and they are pretty self explanatory here: 我这里有一个程序可以复制内存文件系统(尚未完成),它必须从文件中读取其命令,在这里它们很容易说明:

create /foo
create /foo/bar
create /foo/baz
create /foo/baz/qux
write  /foo/bar "test"
read   /foo/bar
read   /foo/baz/qux
read   /foo/baz/quux
create /foo/bar
create /dir
create /bar
create /dir/bar
find bar
delete /foo/bar
find wat
find foo
read   /foo/bar
create /foo/bar
read   /foo/bar
delete_r /foo
exit

I then have a function that given the string it manipulates it to insert folder names in an array strings, a command is a command string and the fullPath string is given by another function that does use the previously created array of strings to compose a new one. 然后,我有一个函数,它给定字符串以操纵它在数组字符串中插入文件夹名称,命令是命令字符串,而fullPath字符串由另一个函数提供,该函数使用先前创建的字符串数组组成一个新字符串。 Here is the struct and the manipulation structure: 这是结构和操作结构:

typedef struct _command {
    unsigned char command[10];
    unsigned char path[255][255];
    unsigned char* fullPath;
    int pathLevels;
} command;

This is the node structure that does implement the tree-like structure: 这是确实实现树状结构的节点结构:

typedef struct _node {
    int isRoot;
    int isDir;
    char* message;
    int childNumber;
    struct _node* childNodes[1024];
    unsigned char fullPath[MAX_LEN_PATH];
    unsigned char resName[255];
} node;

And the function that does manipulate the string: 以及处理字符串的函数:

command* createCommandMul(unsigned char* str) {
    unsigned char* c = str;
    command* commandPointer = (command*) malloc(sizeof(command));
    //commandPointer->path[0][0] = '/';
    //commandPointer->path[0][1] = '\0';
    int commandIndex = 0;
    int pathLevel = 0;
    int pathIndex = 0;
    /* Parte Comando */
    while(*c != ' ' && commandIndex < 10) {
        commandPointer->command[commandIndex] = *c++;
        commandIndex++;
    }
    while(commandIndex<10) {
        commandPointer->command[commandIndex] = '\0';
        commandIndex++;
    }
    while(*c == ' ' || *c == '/') c++; 
    /* Parte Path*/
    while(*c != '\0') {
        if (*c == '/') {
            commandPointer->path[pathLevel][pathIndex] = '\0';
            pathLevel++;
            pathIndex = 0;
            c++;
        } else {
            commandPointer->path[pathLevel][pathIndex] = *c++;
            pathIndex++;
        }
    }
    commandPointer->path[pathLevel][pathIndex] = '\0';
    commandPointer->pathLevels = pathLevel;
    return commandPointer;
}

I have a createDir function that does check if the node* passed to the function is either a dir or the root (imagine this has a tree); 我有一个createDir函数,该函数确实检查传递给该函数的node *是目录还是根(假设它有树); if it is it creates the node. 如果是,它将创建节点。

int createDir(node* fatherOfChildToCreate, unsigned char* fullPath, command* currentCommand) {
    if ((fatherOfChildToCreate->isRoot == 1 || fatherOfChildToCreate->isDir == 1) && fatherOfChildToCreate->childNumber < 1024) {
        node* dirToCreate = (node*) malloc(sizeof(node));
        command* comando = (command*) currentCommand;
        dirToCreate->isDir = 1;
        dirToCreate->isRoot = 0;
        dirToCreate->message = NULL;
        dirToCreate->childNumber = 0;
        strcmp(dirToCreate->fullPath, fullPath);
        for (int i = 0; i < 1024; i++) dirToCreate->childNodes[i] = NULL;
        int index = (int) hashCalc(comando->path[comando->pathLevels]);
        printf("Hash di %s = %d", comando->path[comando->pathLevels], index);
        fatherOfChildToCreate->childNodes[index] = dirToCreate;
        fatherOfChildToCreate->childNumber += 1;
        return 1;
    } else return 0;
}

Note that this createDir functions is created with the purpose of creating a direct subDir of the node* fatherOfChildToCreate so basically the first command of the text file does create /foo using this function because its only parentDir is the root one, which is created in the main() . 请注意, createDircreateDir函数的目的是创建node* fatherOfChildToCreate的直接subDir,因此,基本上,文本文件的第一个命令确实使用此函数创建/foo因为其唯一的parentDir是根目录,该根目录是在main() The second command will search for the /foo directory using this function down below, and since it is the parent directory of /foo/bar that pointer will be passed to the createDir function that will create a childNode in the /foo dir. 第二个命令将使用下面的此函数搜索/foo目录,并且由于它是/foo/bar的父目录,因此该指针将传递到createDir函数,该函数将在/foo目录中创建childNode

node* linearSearchUpper(node* rootNode, unsigned char* upperPath, command* currentCommand) {
    command* comandoSearch = (command*) currentCommand;
    node* curr = (node*) rootNode;
    int counter = comandoSearch->pathLevels;
    int index;
    unsigned char* upperName = comandoSearch->path[comandoSearch->pathLevels - 1];
    for (int i = 0; i < counter; i++) {
        index = (int) hashCalc(comandoSearch->path[i]);
        printf("Hash di %s = %d", comandoSearch->path[i], index);
        if (curr->childNodes[index] == NULL) return NULL;
        else curr = curr->childNodes[index];
    }
    if (strcmp(upperPath, curr->fullPath) == 1) return curr;
}

In all this I've used this hash function to search for the parentDir and inserting a new element in the node->childNodes[] array 在所有这些中,我都使用了此哈希函数来搜索parentDir并在node->childNodes[]数组中插入一个新元素。

unsigned long hashCalc(unsigned char* str) {
    unsigned long hash = 5381;
    int c;
    while (c = *str++)
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
    return hash % 1024;
}

Now, I'll paste the main() which is the last function to review. 现在,我将粘贴main() ,这是要检查的最后一个函数。

int main() {
    node* rootNode = (node*) createRoot();
    command* comando = (command*) malloc(sizeof(command));
    unsigned char* upPath = NULL;
    unsigned char* allPath = NULL;
    unsigned char* line = NULL;
    FILE* fp;
    size_t len = 0;
    ssize_t read;

    fp = fopen("/Users/mattiarighetti/Downloads/semplice.txt", "r");
    if (fp == NULL)
        exit(EXIT_FAILURE);
    while ((read = getline(&line, &len, fp)) != -1) {
        if (*line == 'f') {
            //comandoFind = createCommandFind(line);
        } if (*line == 'w') {
            //comandoWrite = createCommandWrite(line);
        } if (*line == 'c') {
            comando = createCommandMul(line);
            upPath = upperPath(comando);
            allPath = fullPath(comando);
            if (comando->pathLevels == 0) {
                if (createDir(rootNode, allPath, comando) == 1) printf("ok\n\n");
                else printf("no\n\n");
            } else {
                node* upperNode = (node*) linearSearchUpper(rootNode, upPath, comando);
                if (upperNode == NULL) {
                    printf("no\n\n");
                }
                else {
                    if (createDir(upperNode, allPath, comando) == 1) printf("ok\n\n");
                    else printf("no\n\n");
                }
            }
        }
    }
    fclose(fp);
    if (line)
        free(line);
    return 0;
}

So, what this does is reading line to line from the file, creating and filling the command struct, it then creates an upPath which is the parent (to be found) and the fullPath. 因此,此操作是从文件中逐行读取,创建并填充命令struct,然后创建一个upPath,它是父级(将被找到)和fullPath。 The problem I am getting is that the program uses createDir for the first line of this text file, and this is ok, but reading foo in the comando->path[I] for some strange reason, the hash function gives me 179 which is not correct. 我得到的问题是程序对该文本文件的第一行使用createDir,这没关系,但是出于某种奇怪的原因在comando->path[I]读取foo ,哈希函数给了我179不正确。 The in goes on, the second line it uses linearSearchUpper() to search for the parent folder /foo , so it gives comando->path[I] which is again foo but this time the hashCalc gives me 905 which should be the correct answer so in the end the linearSearchUpper can't find the /foo folder since it doesn't exist in the index 905. This thing happens every time I use a create command or create_dir with folders that are childs of the rootOne, so dirs like /foo, /dir, /bar will give me a strange hash index. 输入继续,第二行它使用linearSearchUpper()搜索父文件夹/foo ,所以它给出了comando-> path [I],它也是foo但是这次hashCalc给了我905,它应该是正确的答案因此,最后,linearSearchUpper找不到/ foo文件夹,因为它在索引905中不存在。每次我对带有rootOne子级的文件夹使用create命令或create_dir时,都会发生此情况。 foo,/ dir,/ bar会给我一个奇怪的哈希索引。

Do you have any idea on why this could happen? 您是否知道为什么会发生这种情况?

I haven't tried to understand your whole program, but the strings for wich you get the different hashes really are different: One of them retains the new-line character at the end, probably from fgets . 我并没有尝试了解您的整个程序,但是用于获得不同哈希值的字符串确实有所不同:其中之一保留了最后一行的换行符,可能来自fgets

The numerc value of the new-line character in ASCII is 10, so: ASCII换行符的numerc值为10,因此:

hash("foo") == 905;
hash("foo\n") == (33 * hash("foo") + '\n') % 1024
              == (33 * 905 + 10) % 1024
              == 179

The solution is to either remove trailing spaces from the string you receive from fgets or to use better tokenising, that will guarantee that your tokens don't have leading or trailing spaces. 解决方案是从fgets接收的字符串中删除尾随空格,或者使用更好的标记化,这将确保您的标记不具有前导或尾随空格。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM