简体   繁体   English

解码露天文件名或替换String / fileName中的unicode [_x0020_]字符

[英]Decode alfresco file name or replace unicode[_x0020_] characters in String/fileName

I am using alfresco download upload services using java. 我正在使用Java进行露天下载上传服务。

When I upload the file to alfreco server it gives me the following path : 当我将文件上传到alfreco服务器时,它为我提供了以下路径:

/app:Home/cm:Company_x0020_Home/cm:Abc/cm:TestFile/cm:V4/cm:BC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf

When I use the same file path and download using alfresco services I took the file name at the end of the path 当我使用相同的文件路径并使用露天服务下载时,我将文件名放在路径末尾

i.e    ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf

How can I remove or decode the [Unicode] characters in fileName 如何删除或解码fileName中的[Unicode]字符

String decoded = URLDecoder.decode(queryString, "UTF-8");

The above does not work . 以上不起作用。

These are some Unicode characters which appeared in my file name. 这些是我文件名中出现的一些Unicode字符。 https://en.wikipedia.org/wiki/List_of_Unicode_characters https://en.wikipedia.org/wiki/List_of_Unicode_characters

Please do not mark the question as duplicate as I have searched below links but non of those gave the solution. 请不要将问题标记为重复,因为我已经在下面的链接中进行了搜索,但没有一个给出了解决方案。 Following are the links that I have searched for replacing unicode charectors in String with java. 以下是我搜索的用Java替换String中的unicode字符的链接。

Java removing unicode characters Java删除Unicode字符

Remove non-ASCII characters from String in Java 从Java中的字符串中删除非ASCII字符

How can I replace a unicode character in java string 如何替换Java字符串中的Unicode字符

Java Replace Unicode Characters in a String Java替换字符串中的Unicode字符

The solution given by Jeff Potts will be perfect . Jeff Potts提供的解决方案将是完美的。 But i had a situation where i was using file name in diffrent project where i wont use org.alfresco related jars 但是我有一种情况,我在不同的项目中使用文件名,而我不会使用与org.alfresco相关的jars

I had to take all those dependencies to use for a simple file decoding So i used java native methods which uses regex to parse the file name and decode it,which gave me the perfect solution which was same from using 我必须将所有这些依赖项用于简单的文件解码,所以我使用了Java本机方法,该方法使用regex解析文件名并对其进行解码,这为我提供了与使用相同的完美解决方案

ISO9075.decode(test);

This is the code which can be used 这是可以使用的代码

 public String decode_FileName(String fileName) {
        System.out.println("fileName : " + fileName);
        String decodedfileName = fileName;
        String temp = "";
        Matcher m = Pattern.compile("\\_x(.*?)\\_").matcher(decodedfileName); //rejex which matches _x0020_ kind of charectors
        List<String> unicodeChars = new ArrayList<String>();
        while (m.find()) {
            unicodeChars.add(m.group(1));
        }
        for (int i = 0; i < unicodeChars.size(); i++) {
            temp = unicodeChars.get(i);
            if (isInteger(temp)) {
                String replace_char = String.valueOf(((char) Integer.parseInt(String.valueOf(temp), 16)));//converting  
                decodedfileName = decodedfileName.replace("_x" + temp + "_", replace_char);
            }
        }
        System.out.println("Decoded FileName :" + decodedfileName);
        return decodedfileName;
    }

And use this small java util to know Is integer 并使用这个小的Java util来知道是整数

public static boolean isInteger(String s) {
        try {
            Integer.parseInt(s);
        } catch (NumberFormatException e) {
            return false;
        } catch (NullPointerException e) {
            return false;
        }
        return true;
    }

So the above code works as simple as this : 因此,上面的代码就像这样简单:

Example : 范例:

0028 Left parenthesis U+0028 You can see in the link https://en.wikipedia.org/wiki/List_of_Unicode_characters 0028左括号U + 0028您可以在链接https://en.wikipedia.org/wiki/List_of_Unicode_characters中看到

String replace_char = String.valueOf(((char) Integer.parseInt(String.valueOf("0028"), 16)));
        System.out.println(replace_char);

This code gives output : ( which is a Left parenthesis 这个代码给出的输出: (这是一个左括号

This is what the logic i have used in my java program. 这就是我在Java程序中使用的逻辑。

The above program will give results same as ISO9075.decode(test) 上面的程序将给出与ISO9075.decode(test)相同的结果

Output :

fileName : ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
Decoded FileName :ABC1X 0400 0109-(1-2)_v2.pdf 

In the org.alfresco.util package you will find a class called ISO9075. 在org.alfresco.util包中,您将找到一个名为ISO9075的类。 You can use it to encode and decode strings according to that spec. 您可以使用它根据该规范对字符串进行编码和解码。 For example: 例如:

    String test = "ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf";
    String out = ISO9075.decode(test);
    System.out.println(out);

Returns: 返回:

    ABC1X 0400 0109-(1-2)_v2.pdf

If you want to see what it does behind the scenes, look at the source. 如果要查看其幕后工作,请查看源代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM