简体   繁体   中英

Generate Malformed Strings for Testing

I'm using and contributing to a library ( https://github.com/neitanod/forceutf8 ) to fix encoding issues in our system, and to guarantee that any encoding issues will be transparently fixed before they are displayed to the user.

I need some test cases, and what I would like is a function that takes a UTF-8 string and converts it into a malformed string. Then I can run it through my library to make sure it fixes it properly:

// psuedocode
strings = [ '공', '人', '🔴', 'passé' ];

foreach ( string in strings )
    malformed = garble( string )
    print( string + " => " + malformed + "\n" )

Here are some examples of malformed strings:

  • "人" --> 人
  • "ð´ " --> 🔴 (a red ball—works in Safari)

Here's the raw hex data:

<?php
$strings = array( "人", "人", "ê³µ", "공", "ð´", "🔴" );
foreach ( $strings as $string )
    echo " '$string' \t => '" . unpack( "H*", $string )[1] ."'\n";
?>

Output:

  • '人' => 'c3a4c2bac2ba'
  • '人' => 'e4baba'
  • 'ê³µ' => 'c3aac2b3c2b5'
  • '공' => 'eab3b5'
  • 'ð´' => 'c3b0c29fc294c2b4'
  • '🔴' => 'f09f94b4'

You can see that is e4 ba ba and its characters can be found in the malformed string in between a few c2 's like so:

  • c3a 4 c2 ba c2 ba

I hope this is clear enough.

One way that strings can be generated is to insert them into MySQL and then take them back out.

You can false de- and encode your strings like this:

import org.apache.commons.codec.binary.StringUtils;

[..]

private static void malformedStringTest() {
    byte[] utf8bytes = StringUtils.getBytesUtf8(
              "This is a test\n" 
            + "人 人 ê³µ 공 ð´ 🔴\n"
            + "The above won't work.");
    String asciistring = StringUtils.newStringUsAscii(utf8bytes);

    System.out.println(asciistring);
}

it will output:

This is a test
������ ��� ������ ��� ���� ����
The above won't work.

You can adapt this code to test each string one by one.

Also look at the other static methods of org.apache.commons.codec.binary.StringUtils.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM