I tried string[] file = File.ReadAllLines(file_name)
to read a word file.
In debug mode i found that the first few arguments of the string array file are having values like
" ࡱ 0\\0\\0\\0>\\0\\0 \\t\\0\\0\\0\\0\\0"
. How can i get rid of this.
In certain files the first 3 arguments of the file[] are filled with these while for few files only the first argument is filled with these unreable characters.
What is the problem and how can i get rid of this.? But my word file does not even have a blank line at the beginning.
The problem is you're not opening the file with the correct encoding. Here is a guide to opening and creating Word documents from C#.
File.ReadAllLines is intended for text files. Word files are not text files. To read Word files you might need a library.
If you are using .NET 3.5 then I'd suggest that you use a LINQ where clause to return only the lines that you're interested in.
string[] file = File.ReadAllLines(file_name).Where(line => !line.StartsWith("��")).ToArray();
You could also use some form of regular expression instead of the line.StartsWith()
method.
Note: If you are reading Microsoft Office Word files I'd recommend that you use the COM Interop or 3rd party library to read the MS Word Document (you'll find it much easier than trying to parse the file yourself).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.