简体   繁体   English

我如何使用java读取euc-kr编码系统中utf8编码的文件名?

[英]How can i read file name of utf8 encodng in euc-kr encoding system using java?

I use euc-kr encoding system.我使用 euc-kr 编码系统。 My program made by Java.我的程序由 Java 编写。 It read file name.它读取文件名。 But the program can not read some files name.但程序无法读取某些文件名。 (some files name is utf8 encoding) (部分文件名是utf8编码)

so I tried this所以我试过了

File dir = new File(dirPath);
File[] fileList = dir.listFiles(); //get files that file name is utf8 encoding
String cangedEncodingstr= new String(fileList[0].getName().getBytes("euc-kr"), "euc-kr"); // This is still an invalid string

. .

I think that this string is already broken during call dir.listFiles()我认为这个字符串在调用dir.listFiles()期间已经损坏

File dir = new File(dirPath);
File[] fileList = dir.listFiles(); //get files that file name is utf8 encoding

fileList[0].getName(); // broken String
fileList[0].isFile(); //false
fileList[0].isDirectory(); //false

The string in utf8 encoding also expects isFile () to be true. utf8 编码的字符串也期望isFile ()为真。

Thanks in advance for your reply.预先感谢您的回复。 :-) :-)

File.getName() returns a String . File.getName()返回一个String So, you don't have to do anything more in Java.因此,您不必在 Java 中做更多的事情。

A String in Java is a text-only datatype. Java 中的String是纯文本数据类型。 It contains a counted sequence of UTF-16 code units—that is, a counted sequence of char .它包含 UTF-16 代码单元的计数序列,即char的计数序列。 Every string function is written based on this.每个字符串函数都是基于此编写的。 You generally don't need to account for that because much of text processing doesn't depend on it.您通常不需要考虑这一点,因为大部分文本处理并不依赖于它。 The few times that it does are when you are counting or splitting on indexes that aren't obtained from indexOf or similar text functions.它做的几次是当您计算或拆分不是从 indexOf 或类似文本函数获得的索引时。

Many languages are likes this, such as .NET languages, VB4/5/6/A/Script, JavaScript, SQL NVARCHAR or NCHAR.许多语言都是这样,例如.NET 语言、VB4/5/6/A/Script、JavaScript、SQL NVARCHAR 或NCHAR。 Others have "strings" that really just byte strings, which might be text with one of many encodings, such as Lua, C, C++, ….其他人拥有实际上只是字节串的“字符串”,它可能是具有多种编码之一的文本,例如 Lua、C、C++,……。 Others have different types of strings that keep a difference between a byte string and a character string, such as Python.其他人有不同类型的字符串,保持字节串和字符串之间的区别,例如 Python。 Others have strings that include an attribute to indicate the character encoding of a character string, such as R.其他人的字符串包含一个属性来指示字符串的字符编码,例如 R。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM