简体   繁体   中英

how to identify a file is a text file or other using c#.net

I need to access a file as text file and want to process it later. But before I fetch it how I can identify a file that I am taking is a text file only. If file is in another format my whole code interpret wrongly. I want to access and process only text file.

Currently i am using:

StreamReader objReader = new StreamReader(filePath);

How can I do so in C# .NET?

Well, there are heuristics you could apply:

  • Use the file extension. If it's ".txt" then it's probably a text file, if it's ".jpg" it probably isn't, etc.
  • If you know what encoding the file should be in, check whether it's valid in that encoding
  • Check for common "magic numbers" at the start of the file to identify various well-known binary file types
  • If it's meant to be a Western document, check that if you read the file as a text file, most of it has relatively low Unicode values (typically less than U+0100, but you might want to look at the various Unicode code charts to decide for yourself)
  • Text files tend not to have many characters below U+0020 other than carriage return, line feed and tab

But it's all heuristic, basically. At the end of the day, a file is a name and some bytes, along with some metadata about access permissions. In some file systems there can be more metadata available, but it's typically hard to get at and often not preserved when copying files around - so shouldn't be relied on for this.

If you want to get the extension of the file you can use

Path.GetExtension method

If file is in another format my whole code interpret wrongly.

Sure, if you expect a text file and end up getting a binary file your code will interpret it wrongly. But so is also the case for any invalid text file: what if it's not comma separated when you expect that? Or not json, when that's what you want? Or is in an encoding you can't handle?

The point is, unless you're just copying the text or doing something very low-level with it, you'll need more checking than text vs binary anyway. You should (probably) check that the entire file conforms to your needs. And that will catch any non-text files that are passed in to your program too!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM