简体   繁体   中英

C# convert byte[] to string[]

I am coding in C# and willing to use unsafe/fixed.

I would like to be able to convert from a byte[] to a string[]. I started with a file of strings (terminated by \\n). I replaced all of the \\n with \\0 in the byte array that I read from the file. I thought I might now just reinterpret the byte[] as a string[] since the newlines are now \\0s. I think that makes sense, but I could be wrong. If I recall from C++ (decades ago unfortunately) a string[] is just a char[][] where each inner char[] is null terminated. So, I think the code below could work if I could do the (fancycast).

// File contains strings on each line

byte[] bytes = ReadFile();
Replace(bytes, '\n', \0');
string[] strings = (fancycast)bytes

I don't know how to do the (fancycast). Thank you very much.

I know about all of the Streams and Readers in C# and I have specific reasons why I am not using them. Please don't suggest a different design. I would just like to reinterpret cast the array. Thank you for your help.

C# uses PASCAL strings, not C strings. Your best bet is probably to leave the \\n characters alone and doing a Split().

byte[] bytes = ReadFile();
string oneBigString = Encoding.ASCII.GetString(bytes);
string[] lines = oneBigString.Split('\n');

If you just want to read a file in C# you could simply use:

string text = System.IO.File.ReadAllText("PathToFile");

Or

string[] lines = System.IO.File.ReadAllLines("PathToFile");

Otherwise simply create a string from bytes and split the string:

bytes[] = ReadFile();
string allData = System.Text.Encoding.<Encoding>.GetString(result);
string[] lines = allData.Split('\n');

try

System.Text.Encoding.Default.GetString(bytes);

But, you don't have to read the file as byte arrays and then convert it to string array in C#. Instead you can directly read as string / string array using ReadAllText(path) or ReadAllLines(path) respectively.

string allText = File.ReadAllText("file path");
string[] allLines = File.ReadAllLines("file path");

There is an important (REALLY important) thing to know about C# strings: They are immutable sequences of Unicode characters, and that's the only truly certain thing that you can say about them. As such you cannot make assumptions about how big any one character might be, and you cannot make assumptions about the byte offset of any character in the string.

Well, you can make assumptions, and most of the time it'll probably work, but when it doesn't work it will be a massive pain to debug.

A Unicode character can require 8, 16, or 32 bits. C# uses UTF-16 encoding for strings, which means that characters in the string are AT LEAST 16 bits. 32-bit characters are part of the Unicode specification (eg: Emojis tend to live in the 32-bit space, like this one at 0x1F44C: 👌) and C# makes no promises about how the resulting string might be laid out in memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM