简体繁体 English

我从未听说过的fileinfo和mime类型

[英]fileinfo and mime types I've never heard of

原文 2010-02-27 01:35:45 6 4 php/ fileinfo

I'm not a stranger to mime types but this is strange. 我对mime类型并不陌生，但这很奇怪。 Normally, a text file would have been considered to be of text/plain mime but now, after implementing fileinfo, this type of file is now considered to be "text/x-pascal". 通常，文本文件可能被认为是text / plain mime，但是现在，在实现fileinfo之后，这种类型的文件现在被认为是“text / x-pascal”。 I'm a little concerned because I need to be sure that I get the correct mime types set before allowing users to upload with it. 我有点担心，因为我需要确保在允许用户上传之前我获得了正确的mime类型集。

Is there a cheat sheet that will give me all of the "common" mimes as they are interpreted by fileinfo? 有没有一个备忘单可以给我所有的“常见”哑剧，因为它们是由fileinfo解释的？

Sinan provided a link that lists all of the more common mimes. 思南提供了一个列出所有更常见的哑剧的链接。 If you look at this list, you will see that a .txt file is of text/plain mime but in my case, a plain-jane text file is interpreted as text/pascal. 如果查看此列表，您将看到.txt文件是text / plain mime，但在我的情况下，普通jane文本文件被解释为text / pascal。

4 个解决方案

fileinfo is a "best guess". fileinfo是一个“最佳猜测”。 It analyzes only a portion of the file in order to try to figure out what type the file is, and as such it can be fooled easily enough. 它只分析文件的一部分，以便试图找出文件的类型，因此它可以很容易被愚弄。 Perhaps your file starts with a Pascal comment or keyword such as Project or Unit . 也许您的文件以Pascal注释或关键字（如Project或Unit开头。

Fileinfo is not using the extension of the file to determine which mime-type it is, but ( quoting ) : Fileinfo没有使用文件的扩展名来确定它是哪种mime类型，而是（引用） ：

The functions in this module try to guess the content type and encoding of a file by looking for certain magic byte sequences at specific positions within the file. 此模块中的函数尝试通过查找文件中特定位置的某些魔术字节序列 来猜测文件的内容类型和编码。

The idea being that the name à of the file, and its extension, are provided by the users (especially in a case such as yours, where the files are being uploaded by users) , and, as such, are less "sure" than the content of the file itself. 这个想法是文件的名称à及其扩展名由用户提供（特别是在诸如你自己的情况下，用户上传文件的情况下） ，因此，不如说“确定”文件本身的内容。

Maybe a solution could be to not check on the whole mime-type returned by fileinfo, but to only use the first part of it -- at least in some cases ? 也许解决方案可能是不检查fileinfo返回的整个mime类型，而只是使用它的第一部分 - 至少在某些情况下？

For instance, maybe you could accept all mimetype that are in the text/* and image/* families, and refuse all those look like application/* , except for application/pdf ? 例如，也许你可以接受text/*和image/*系列中的所有mimetype，并拒绝所有那些看起来像application/* ，除了application/pdf ？
(Just an example -- but you see the point) （只是一个例子 - 但你明白了这一点）

I have found that, as of at least version 5.03, the ' file ' command can in some circumstances mis-identify a plain text file as a Pascal source file, simply because it contains the word 'program' or 'record'. 我发现，至少从版本5.03开始，' file '命令在某些情况下可能会错误地将纯文本文件识别为Pascal源文件，因为它包含单词'program'或'record'。 At least that's how it looks having examined the source (src/names.h). 至少它是如何检查源（src / names.h）。 I believe the php fileinfo command uses the same 'magic' engine, so I suspect this is the cause of the problem. 我相信php fileinfo命令使用相同的“魔术”引擎，所以我怀疑这是问题的原因。 If/when I am accepted on the file mailing list, I will notify the maintainers of this issue. 如果/当我在文件邮件列表中被接受时，我将通知维护者此问题。

[UPDATE] I asked the question, but got little in the way of a response. [更新]我问了这个问题，但没有采取任何回应。 Having investigated this issue a bit more throughly, it turns out that identifying text formats is, in general, really difficult . 仔细研究过这个问题后，事实证明识别文本格式通常非常困难。 If you get a 'text/*' MIME type back from file, you might want to consider ignoring the result and assuming the resource is just 'text/plain', unless the false negatives (text/html maybe) will cause you difficulties. 如果从文件中获得“text / *”MIME类型，您可能需要考虑忽略结果并假设资源只是“text / plain”，除非错误否定（text / html）可能会给您带来困难。