简体   繁体   中英

How can I read any file into a string

I want to be able to read any file into a string, for instance the way notepad might open a word file. Using the following code:

StreamReader sr = new StreamReader(filePath);
text += sr.ReadToEnd();
sr.Close();

works fine on a basic text file but when using it on say a word file I just get a few odd characters whereas opening the same file in notepad shows me the entire file, text, special characters etc. I'm using this as part of a file drop into a textbox. Basically I'm looking to get the same output you would get when you open any file in notepad. What should I be using instead?

Using your code from the original question and opening a file, does show the entire stream (when looking it in debugger) - The problem is that most of these binary files have null terminators ( \\0 char) which will cause most viewers to stop reading the contents of the stream.

If you remove/escape the '\\0' you'll see the entire stream just like in notepad.

For example:

string filePath = @"c:\windows\system32\calc.exe";
StreamReader sr = new StreamReader(filePath);
string text = sr.ReadToEnd();
sr.Close();

textBox1.Text = text.Replace('\0', ' ');

Add a textbox1 to a form and see for yourself... You'll see the entire stream...

This should give you the functionality that you want. First read the file in as a byte[] using

byte[] data = File.ReadAllBytes(fileName);

then just encode it with ascii, or whatever.

string s = Encoding.ASCII.GetString(data);

I'm assuming you're referring to WordPad, which is also included with Windows, rather than Notepad. WordPad, in addition to showing basic text files, also knows to parse and edit Word files (.DOCX, but oddly enough not the older .DOC files), Rich Text Format files (.RTF), and OpenOffice documents (*.ODT). This doesn't come freely just by opening the Word file and displaying its content - there is a lot of code inside WordPad to parse this binary data and display it properly, not to mention the code to edit and save it again.

If you need to retrieve the data from Word files, there are several programmatic options, starting with automating the Word application itself using the Word APIs . However, this solution is problematic for running on a server, or if you need to open them where there is no Word installed.

In this case you also have several options. For post-2007 documents with the .DOCX extension, you can use the System.IO.Packaging namespace to open the DOCX and extract its relevant parts, but it's up to you to understand the syntax of the XML files within. Alternately, you can purchase a third-party library that does it for you, such as Aspose , which I've worked with and were fine. There are others out there too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM