简体   繁体   中英

open() function in Linux with extended characters (128-255) returns -1 error

When i try to create a file in LINUX using open() function, i get an error '-1' for the filename that contains extended character (ex: Björk.txt). Here the file contains a special character ö (ASCII 148)

I am using the below code:

char* szUnixPath

/home/user188/Output/Björk.txt

open(szUnixPath, locStyle, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);

I always get a -1 error, and NO FILE is created.

As the OS encounters the ASCII 148, it throws an error.

The same function works perfectly fine if i use a tilde ~ (ASCII 126, example: Bj~rk.txt) or any other character below ASCII value 128.

can somebody explain why do i get the -1 error only for filename having special character ranging between 128-255 ?

I recommend just trying yourself to see what bytes this name contains.

Create the file in a directory, then run the following simple C program:

#include <dirent.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
    /* Open directory */
    DIR * currdir = opendir(".");

    /* Iterate over files */
    struct dirent * directory_entry = NULL;
    while (NULL != (directory_entry = readdir(currdir)))
    {
        char * entry_name = directory_entry->d_name;
        printf("Directory entry: %s\n", entry_name);
        printf("Name bytes (len: %d):\n", strlen(entry_name));
        for (size_t i = 0; i < strlen(entry_name); ++i)
        {
            printf("\tname[%d] = %d\n", i, entry_name[i]);
        }
    }

    return 0;
}

We can easily see in the output that 'Björk' length is 6-bytes. And we can see these bytes values:

Directory entry: Björk
Name bytes (len: 6):
    name[0] = 66
    name[1] = 106
    name[2] = -61
    name[3] = -74
    name[4] = 114
    name[5] = 107

Filenames in Linux are generally specified in UTF-8, not CP437. The open is failing because the filename you're passing doesn't match the one in the OS.

Try opening this file instead: /home/user188/Output/Bj\\xc3\\xb6rk.txt . This is the special character encoded in UTF-8 as two bytes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM