简体   繁体   中英

How to create filename with characters that are not part of UTF-8 on Windows?

[Edit/Disclaimer]: Comments pointed out that I have to clarify the encoding the user uses. Will update accordingly

I have a customer from China who recently reported an issue with their filenames on Windows. The software works with most Chinese characters, but it seems he has found one file that fails.

Unfortunately, they are not able to send me over the filename as neither zipping nor transmitting the file through other mediums seem to preserve the filename.

What is the easiest way (eg through Python) to generate a filename on Windows that is covered by the NTFS file system encoding but not UTF8 ?

Unicode strings are encoded as a series of bytes. The rules of what a series of bytes visually looks like to you in an operating system, is what operating systems use to turn bytes into characters.

Given that Windows uses a (variation of-) Unicode, and you say you have a character that's not in unicode, it also means that there is simply no way to represent that character.

Imagine if unicode only contained the numbers 0-9, and you ask someone how to encode the letter A . There's no answer to this, because only 0-9 are defined.

You could make up a new unicode codepoint for your character, but then operating systems won't know what to do with that unless you also make your own font files.

I somehow doubt that that's what you want to do though, but it's an option. Could your customer rename the file before sending it to you?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM