简体   繁体   English

将ascii字符集转换回字符串

[英]Convert set of ascii characters back to string

I currently have a situation where I am converting string to ascii characters: 我目前有一种情况,我将字符串转换为ascii字符:

        String str = "are";  // or anything else

        StringBuilder sb = new StringBuilder();
        for (char c : str.toCharArray())
            sb.append((int)c);

        BigInteger mInt = new BigInteger(sb.toString());
        System.out.println(mInt);

where output (in this case) is 97114101 I am struggling to find a way how to reverse this, convert string of ascii characters back to a string eg "are" 其中输出(在这种情况下)是97114101我正在努力找到一种方法如何扭转这一点,将ascii字符串转换回字符串,例如“are”

You cannot do it with decimal numbers, because the number of digits in their representation changes. 您不能使用十进制数来执行此操作,因为其表示中的位数会发生变化。 Because of this, you wouldn't be able to distinguish sequences 112 5 from 11 25 and 1 125 . 因此,您将无法区分序列112 511 251 125

You could force each character to occupy exactly three digits, however. 但是,您可以强制每个角色占据正好三位数。 In this case, you would be able to restore the number back by repeatedly dividing by 1000, and taking the remainder: 在这种情况下,你可以通过重复除以1000来恢复数字,然后取余数:

for (char c : str.toCharArray()) {
    String numStr = String.valueOf((int)c);
    while (numStr.length() != 3) numStr = "0"+numStr;
    sb.append(numStr);
}

If you use only the ASCII section of the UNICODE code points, this is somewhat wasteful, because the values that you need are for the most part two-digit. 如果仅使用UNICODE代码点的ASCII部分,这有点浪费,因为您需要的值大部分是两位数。 If you switch to hex, all ASCII code points would fit in two digits: 如果切换到十六进制,则所有ASCII代码点都适合两位数:

for (char c : str.toCharArray()) {
    String numStr = Integer.toString(c, 16);
    if (numStr.length() == 1) numStr = "0"+numStr;
    sb.append(numStr);
}
BigInteger mInt = new BigInteger(sb.toString(), 16);

Now you can use division by 256 instead of 1000. 现在你可以使用除以256而不是1000。

The simple answer is that you cant as you have lost data. 简单的答案是你不能丢失数据。 You have no way of knowing how many digits each character had. 你无法知道每个角色有多少位数。

You need some sort of separator between the numbers. 您需要在数字之间使用某种分隔符。

The answer is a big No , You cannot get it back with your existing approach. 答案是一大禁忌 ,你不能用现有的方法把它找回来。

Instead you can have an integer array ( if possible ). 相反,你可以有一个整数数组( 如果可能的话 )。 You may get best solution if you explain why you are actually doing this. 如果你解释为什么你实际上这样做,你可能会得到最好的解决方案。

This could be do-able if all the characters you use in the String are double-digit ASCIS. 如果您在字符串中使用的所有字符都是两位数的ASCIS,则可以执行此操作。 For example: "ARE" would give '658269' and you would know to treat it two digits at a time to reverse it. 例如:“ARE”会给出'658269',你会知道一次两个数字来反转它。 The problem here is that you don't now whether it is double or triple digit ASCI codes.... 这里的问题是你现在不用它是双重还是三位数的ASCI代码....

However, if it is purely String values [a-zA-Z], you could see whether the double digit lies in the range [65-90] or [97-99] else take the triple digit and it should be in the range [100-122] 但是,如果它是纯粹的字符串值[a-zA-Z],您可以看到双位数是否在[65-90]或[97-99]范围内,否则取三位数,它应该在范围内[100-122]

But it goes without saying that there are better ways of doing this. 但不言而喻,有更好的方法可以做到这一点。

As others have pointed out, this is not doable in general . 正如其他人所指出的那样,这一般是不可行 However, as others have also argued, it is doable if you make certain limiting assumptions. 然而,正如其他人也认为的那样,如果你做出某些限制性假设,这是可行的。 In addition to the ones presented already, another assumption could be that the strings you're converting are all English words. 除了已经提出的那些,另一个假设可能是你要转换的字符串都是英文单词。

Then you would know that each character takes up either 2 or 3 digits in the integer. 然后你会知道每个字符占整数的2或3位数。 The following code exemplifies the use of a function that checks whether 2 digits are OK or whether you have to consider 3 digits: 以下代码举例说明了使用检查2位数是否正常的函数或是否必须考虑3位数:

public String convertBack(BigInteger bigInteger) {
    StringBuilder buffer = new StringBuilder();

    String digitString = bigInteger.toString();

    for (int to, from = 0; from + 2 <= digitString.length(); from = to) {
        // minimally extract two digits at a time
        to = from + 2;
        char c = (char) Integer.parseInt(digitString.substring(from, to));

        // if two digits are not enough, try 3
        if (!isLegalCharacter(c) && to + 1 <= digitString.length()) {
            to++;
            c = (char) Integer.parseInt(digitString.substring(from, to));
        }

        if (isLegalCharacter(c)) {
            buffer.append(c);
        } else {
            // error, can't convert
            break;
        }
    }

    return buffer.toString();
}

private boolean isLegalCharacter(char c) {
    return c == '\'' || Character.isLetter(c);
}

This particular isLegalCharacter method is not very strong, but you can adapt it to your needs. 这个特别是isLegalCharacter方法不是很强大,但你可以根据自己的需要进行调整。 For instance, it fails for umlaut characters, such as, eg, in the word "naïveté". 例如,它对于变音符号而言是失败的,例如在单词“naïveté”中。

But if you know that you will never run into such cases, the above approach might work for you. 但是如果你知道你永远不会遇到这种情况,那么上述方法可能适合你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM