简体   繁体   中英

C# - Check if File is Text Based

How can I test whether a file that I'm opening in C# using FileStream is a "text type" file? I would like my program to open any file that is text based, for example, .txt, .html, etc.

But not open such things as .doc or .pdf or .exe, etc.

In general: there is no way to tell.

A text file stored in UTF-16 will likely look like binary if you open it with an 8-bit encoding. Equally someone could save a text file as a .doc (it is a document).

While you could open the file and look at some of the content all such heuristics will sometimes fail (eg. notepad tries to do this, by careful selection of a few characters notepad will guess wrong and display completely different content).

If you have a specific scenario, rather than being able to open and process anything, you should be able to do much better.

I guess you could just check through the first 1000 (arbitrary number) characters and see if there are unprintable characters, or if they are all ascii in a certain range. If the latter, assume that it is text?

Whatever you do is going to be a guess.

As others have pointed out there is no absolute way to be sure. However, to determine if a file is binary (which can be said to be easier than determining if it is text) some implementations check for consecutive NUL characters. Git apparently just checks the first 8000 chars for a NUL and if it finds one treats the file as binary. See here for more details.

Here is a similar C# solution I wrote that looks for a given number of required consecutive NUL. If IsBinary returns false then it is very likely your file is text based.

public bool IsBinary(string filePath, int requiredConsecutiveNul = 1)
{
    const int charsToCheck = 8000;
    const char nulChar = '\0';

    int nulCount = 0;

    using (var streamReader = new StreamReader(filePath))
    {
        for (var i = 0; i < charsToCheck; i++)
        {
            if (streamReader.EndOfStream)
                return false;

            if ((char) streamReader.Read() == nulChar)
            {
                nulCount++;

                if (nulCount >= requiredConsecutiveNul)
                    return true;
            }
            else
            {
                nulCount = 0;
            }
        }
    }

    return false;
}

To get the real type of a file, you must check its header, which won't be changed even the extension is modified. You can get the header list here , and use something like this in your code:

using(var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
   using(var reader = new BinaryReader(stream))
   {
     // read the first X bytes of the file
     // In this example I want to check if the file is a BMP
     // whose header is 424D in hex(2 bytes 6677)
     string code = reader.ReadByte().ToString() + reader.ReadByte().ToString();
     if (code.Equals("6677"))
     {
        //it's a BMP file
     }
   }
}

I have a below solution which works for me.This is general solution which check all types of Binary file.

     /// <summary>
     /// This method checks whether selected file is Binary file or not.
     /// </summary>     
     public bool CheckForBinary()
     {

             Stream objStream = new FileStream("your file path", FileMode.Open, FileAccess.Read);
             bool bFlag = true;

             // Iterate through stream & check ASCII value of each byte.
             for (int nPosition = 0; nPosition < objStream.Length; nPosition++)
             {
                 int a = objStream.ReadByte();

                 if (!(a >= 0 && a <= 127))
                 {
                     break;            // Binary File
                 }
                 else if (objStream.Position == (objStream.Length))
                 {
                     bFlag = false;    // Text File
                 }
             }
             objStream.Dispose();

             return bFlag;                   
     }
public bool IsTextFile(string FilePath)
  using (StreamReader reader = new StreamReader(FilePath))
  {
       int Character;
       while ((Character = reader.Read()) != -1)
       {
           if ((Character > 0 && Character < 8) || (Character > 13 && Character < 26))
           {
                    return false; 
           }
       }
  }
  return true;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM