简体   繁体   English

File.toURI 不编码加号

[英]File.toURI does not encode plus sign

I just want to check my own sanity with this question here.我只是想在这里用这个问题检查我自己的理智。 I have a filename which has a + (plus) character in it, which is perfectly valid on some operating systems and filesystems (eg MacOS and HFS+).我有一个包含+ (加号)字符的文件名,它在某些操作系统和文件系统(例如 MacOS 和 HFS+)上完全有效。

However, I am seeing an issue where I think that java.io.File#toURI() is not operating correctly.但是,我看到一个问题,我认为java.io.File#toURI()运行不正常。

For example:例如:

new File("hello+world.txt").toURI().toString()

On my Mac machine returns:在我的 Mac 机器上返回:

file:/Users/aretter/code/rocksdb/hello+world.txt

However IMHO, that is not correct, because the + (plus) character from the filename has not been encoded in the URI.但是恕我直言,这是不正确的,因为文件名中的+ (加号)字符尚未在 URI 中编码。 The URI does not represent the original filename at all, a + in a URI has a very different meaning to a + character in a filename. URI不都表示原始文件名, +在URI有一个非常不同的含义为+文件名中的字符。

So if we decode the URI, the plus will now be replaced with a因此,如果我们对 URI 进行解码,加号现在将被替换为(space) character, and we have lost information. (空格)字符,我们丢失了信息。 eg:例如:

URLDecoder.decode(new File("hello+world.txt").toURI().toURL().toString)

Which results in:结果是:

file:/Users/aretter/code/rocksdb/hello world.txt

What I would have expected instead would be something like:我所期望的会是这样的:

new File("hello+world.txt").toURI().toString()

resulting in:导致:

file:/Users/aretter/code/rocksdb/hello%2Bworld.txt

So that when it is later used and decoded the plus sign is preserved.以便在以后使用和解码时保留加号。

I am struggling to believe that such an obvious bug could be present in Java SE.我很难相信 Java SE 中可能存在如此明显的错误。 Can someone point out where I am mistaken?有人能指出我错在哪里吗?

Also, if there is a workaround, I would like to hear about it please?另外,如果有解决方法,我想听听吗? Keep in mind that I am not actually providing static strings as filenames to File, but rather reading a directory of files from disk, of which some of those files may contain a + (plus) character.请记住,我实际上并不是将静态字符串作为文件名提供给 File,而是从磁盘读取文件目录,其中一些文件可能包含+ (加号)字符。

Let me try to clarify,让我试着澄清一下,

  • '+' plus character is used as encoding character to encode ' ' space in context of HTML form (aka application/x-www-form-urlencoded MIME format). '+' 加号字符用作编码字符以在 HTML 表单(又名 application/x-www-form-urlencoded MIME 格式)的上下文中对 ' ' 空间进行编码。
  • '%20' character is used as encoding character to encode ' ' space in context of URL/URI format. '%20' 字符用作编码字符以在 URL/URI 格式的上下文中对 ' ' 空格进行编码。

'+' plus character is threat as a normal character in context of URL and it is not encoded in any form (eg %20). '+' 加号字符作为 URL 上下文中的普通字符是一种威胁,它没有以任何形式进行编码(例如 %20)。

So when you call the new File("hello+world.txt").toURI().toString() does not perform any encoding for '+' character(simply because it is not required).因此,当您调用new File("hello+world.txt").toURI().toString()不会对 '+' 字符执行任何编码(仅仅是因为它不是必需的)。

Now come to URLDecoder , this class is an utility class for HTML form decoding.现在来到URLDecoder这个类是一个用于 HTML 表单解码的实用类。 It treat the '+' plus as encoded character and hence decode it to ' ' space character.它将 '+' 加号视为编码字符,因此将其解码为 ' ' 空格字符。 In your example, this class tread the URI's to string value as normal html form field's value (not the URI value).在您的示例中,此类将 URI 的字符串值作为普通 html 表单字段的值(而不是 URI 值)。 This class should never be used to decode the full URI/URL value as it is not designed for this purpose)此类不应用于解码完整的 URI/URL 值,因为它不是为此目的而设计的)

From java docs of URLDecoder#decode(String) ,来自URLDecoder#decode(String) 的 java 文档

Decodes a x-www-form-urlencoded string.解码一个 x-www-form-urlencoded 字符串。 The platform's default encoding is used to determine what characters are represented by any consecutive sequences of the form "%xy".平台的默认编码用于确定由“%xy”形式的任何连续序列表示的字符。

Hope it helps.希望它有帮助。

Update #1 based on comments:根据评论更新#1:

As per section 2.2 , If data for a URI component has conflicts with a reserved character, then the conflicting data must be percent-encoded before the URI is formed.根据第 2.2 节,如果 URI 组件的数据与保留字符冲突,则必须在形成 URI 之前对冲突数据进行百分比编码。

It is also an important point that different parts of URI has different set of reserved words depending on the their context.重要的一点是,URI 的不同部分根据上下文具有不同的保留字集。 For example, / sign is reserved only in path part of URI, + sign is reserved in query string part.例如, /符号仅保留在 URI 的路径部分, +符号保留在查询字符串部分。 So there is no need to escape / in query part and similarly there is no need to escape + in path part .所以不需要在查询部分转义/ ,同样也不需要在路径部分转义+

In your example, URI producer File.toURI does not encode + sign in path part of URI (since +' is not considered as reserved word in path part) and you see the +' sign in to URI's to string representation.在您的示例中,URI 生产者File.toURI不编码 + 登录 URI 的路径部分(因为+' is not considered as reserved word in path part) and you see the +' 登录到 URI 的字符串表示。

You may refers to URI recommendation for more details.您可以参考URI 推荐了解更多详情。

Related answer:相关回答:

  1. https://stackoverflow.com/a/1006074/1700467 https://stackoverflow.com/a/1006074/1700467
  2. https://stackoverflow.com/a/2678602/1700467 https://stackoverflow.com/a/2678602/1700467
  3. https://stackoverflow.com/a/4571518/1700467 https://stackoverflow.com/a/4571518/1700467

I'm assuming, you wanted to encode + sign in your filename to %2B .我假设,您想将文件名编码+登录到%2B So, that you get back it as + sign when you decode it back.所以,当你解码它时,你会得到它作为+符号。

If that is the case, then you need to use URLEncoder.encode如果是这种情况,那么您需要使用URLEncoder.encode

System.out.println(URLEncoder.encode(new File("hello+world.txt").toURI().toString()));

It will encode all special characters including + sign.它将编码所有特殊字符,包括+符号。 The output would be输出将是

file%3A%2Fhome%2FT8hvs7%2Fhello%2Bworld.txt

Now, to decode use URLDecoder.decode现在,解码使用URLDecoder.decode

System.out.println(URLDecoder.decode("file%3A%2Fhome%2FwQCXni%2Fhello%2Bworld.txt"));

It will display它会显示

file:/home/wQCXni/hello+world.txt

Obviously this is not a bug, documentation clearly says显然这不是错误, 文档清楚地说明

The plus sign "+" is converted into a space character " " .

You can do something like that: https://ideone.com/JHDkM4你可以这样做: https : //ideone.com/JHDkM4

import java.util.*;
import java.lang.*;
import java.io.*;
import static java.lang.System.out;


class Ideone
{
    public static void main (String[] args) throws java.lang.Exception
    {
        out.println(new File("hello+world.txt").toURI().toString());
        out.println(java.net.URLDecoder.decode(new File("hello+world.txt").toURI().toURL().toString()));
        out.println(new File("hello+world.txt").toURI().toString().replaceAll("\\+", "%2B"));
    }
}

If the URI represents a file, let the File class decode the URI.如果 URI 表示一个文件,则让 File 类对 URI 进行解码。

Let's say we have a URI for a file, for example to get the filepath of a jar file : URI uri = MyClass.class.getProtectionDomain().getCodeSource().getLocation().toURI();假设我们有一个文件的 URI,例如获取 jar 文件的文件路径:URI uri = MyClass.class.getProtectionDomain().getCodeSource().getLocation().toURI();

System.out.println(uri.toString()); System.out.println(uri.toString());
=> BAD : will display the plus sign, but %20 for spaces => BAD : 将显示加号,但空格为 %20

System.out.println(URLDecoder.decode(uri.toString(), StandardCharsets.UTF_8.toString())); System.out.println(URLDecoder.decode(uri.toString(), StandardCharsets.UTF_8.toString()));
=> BAD : will display spaces instead of %20, but also instead of the plus sign => BAD : 将显示空格而不是 %20,但也不是加号

System.out.println(new File(uri).getAbsolutePath()); System.out.println(new File(uri).getAbsolutePath());
=> GOOD => 好

尝试用反斜杠转义加号\\所以这样做

new File("hello\+world.txt").toURI().toString()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM