简体   繁体   中英

How to replace extended ASCII characters in C#?

I am trying to replace non-printable characters ie extended ASCII characters from a HUGE string.

foreach (string line in File.ReadLines(txtfileName.Text))
            {
                MessageBox.Show( Regex.Replace(line,
              @"\p{Cc}",
              a => string.Format("[{0:X2}]", " ")
            )); ;

            }

this doesnt seem to be working.

EX: AAÂAA should be converted to AA AA

Assuming the Encoding to be UTF8 try this:

string strReplacedVal = Encoding.ASCII.GetString(
        Encoding.Convert(
            Encoding.UTF8,
            Encoding.GetEncoding(
                Encoding.ASCII.EncodingName,
                new EncoderReplacementFallback(" "),
                new DecoderExceptionFallback()
                ),
            Encoding.UTF8.GetBytes(line)
        )
);

Since you are opening the file as UTF-8, it must be. So, its code units are one byte and UTF-8 has the very nice feature of encoding characters above ␡ with bytes exclusively above 0x7f and characters at or below ␡ with bytes exclusively at or below 0x7f.

For efficiency, you can rewrite the file in place a few KB at a time.

Note: that some characters might be replaced by more than one space, though.

// Operates on a UTF-8 encoded text file
using (var stream = File.Open(path, FileMode.Open, FileAccess.ReadWrite))
{
    const int size = 4096;
    var buffer = new byte[size];
    int count; 
    while ((count = stream.Read(buffer, 0, size)) > 0)
    {
        var changed = false;
        for (int i = 0; i < count; i++)
        {
            // obliterate all bytes that are not encoded characters between ␠ and ␡ 
            if (buffer[i] < ' ' | buffer[i] > '\x7f')
            {
                buffer[i] = (byte)' ';
                changed = true;
            }
        }
        if (changed)
        {
            stream.Seek(-count, SeekOrigin.Current);
            stream.Write(buffer, 0, count);
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM