简体   繁体   English

文件上传失败,当使用 Indy 发布并且文件名包含希腊字符时

[英]File upload fails, when posting with Indy and filename contains Greek characters

I am trying to implement a POST to a web service.我正在尝试对 Web 服务实施POST I need to send a file whose type is variable ( .docx , .pdf , .txt ) along with a JSON formatted string.我需要发送一个类型为可变的文件( .docx.pdf.txt )以及一个 JSON 格式的字符串。

I have manage to post files successfully with code similar to the following:我设法使用类似于以下的代码成功发布文件:

procedure DoRequest;
var
  Http: TIdHTTP;
  Params: TIdMultipartFormDataStream;
  RequestStream, ResponseStream: TStringStream;
  JRequest, JResponse: TJSONObject;
  url: string;
begin
  url := 'some_custom_service'

  JRequest := TJSONObject.Create;
  JResponse := TJSONObject.Create;
  try
    JRequest.AddPair('Pair1', 'Value1');
    JRequest.AddPair('Pair2', 'Value2');
    JRequest.AddPair('Pair3', 'Value3');

    Http := TIdHTTP.Create(nil);           
    ResponseStream := TStringStream.Create;
    RequestStream := TStringStream.Create(UTF8Encode(JRequest.ToString));
    try   
      Params := TIdMultipartFormDataStream.Create;
      Params.AddFile('File', ceFileName.Text, '').ContentTransfer := '';
      Params.AddFormField('Json', 'application/json', '', RequestStream);

      Http.Post(url, Params, ResponseStream);
      JResponse := TJSONObject.ParseJSONValue(ResponseStream.DataString) as TJSONObject;
    finally    
      RequestStream.Free;
      ResponseStream.Free;
      Params.Free;
      Http.Free;
    end;
  finally
    JRequest.Free;
    JResponse.Free;
  end;
end;

The problem appears when I try to send a file that contains Greek characters and spaces in the filename.当我尝试发送文件名中包含希腊字符和空格的文件时,就会出现问题。 Sometimes it fails and sometimes it succeeds.有时它失败,有时它成功。

After a lot of research, I notice that the POST header is encoded by Indy's TIdFormDataField class using the EncodeHeader() function.经过大量研究,我注意到POST标头是由 Indy 的TIdFormDataField类使用EncodeHeader()函数编码的。 When the post fails, the encoded filename in the header is split, compared to the successful post where is not split.当发布失败时,与未拆分的成功发布相比,标头中的编码文件名被拆分。

For example :例如 :

  • Επιστολή εκπαιδευτικο.docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#$D#$A' =?UTF-8?B?eA==?= , which fails. Επιστολή εκπαιδευτικο.docx被编码为=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#$D#$A' =?UTF-8?B?eA==?=
  • Επιστολή εκπαιδευτικ.docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?= , which succeeds. Επιστολή εκπαιδευτικ.docx被编码为=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?= ,它成功了。
  • Επιστολή εκπαιδευτικ .docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx , which fails. Επιστολή εκπαιδευτικ .docx被编码为=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx ,这失败了。

I have tried to change the encoding of the filename, the AContentType of the AddFile() procedure, and the ContentTransfer , but none of those change the behavior, and I still get errors when the encoded filename is split.我曾试图改变文件名的编码,该AContentType中的AddFile()方法,以及ContentTransfer我,但没有这些改变的行为,并仍然出现错误时编码的文件名是分裂的。

Is this some kind of bug, or am I missing something?这是某种错误,还是我错过了什么?

My code works for every case except those I described above.我的代码适用于除我上面描述的那些情况之外的所有情况。

I am using Delphi XE3 with Indy10.我正在使用带有 Indy10 的 Delphi XE3。

EncodeHeader() does have some known issues with Unicode strings: EncodeHeader()确实有一些 Unicode 字符串的已知问题:

EncodeHeader() needs to take codeunits into account when splitting data between adjacent encoded-words EncodeHeader() 在相​​邻编码字之间拆分数据时需要考虑代码单元

Basically, an MIME-encoded word cannot be more than 75 characters in length, so long text gets split up.基本上,一个 MIME 编码的单词长度不能超过 75 个字符,因此长文本会被拆分。 But when encoding a long Unicode string, any given Unicode character may be charset-encoded using 1 or more bytes, and EncodeHeader() does not yet avoid erroneously splitting a multi-byte character between two individual bytes into separate encoded words (which is illegal and explicitly forbidden by RFC 2047 of the MIME spec).但是在对长 Unicode 字符串进行编码时,任何给定的 Unicode 字符都可能使用 1 个或更多字节进行字符集编码,并且EncodeHeader()还不能避免错误地将两个单独字节之间的多字节字符拆分为单独的编码字(这是非法的)并被 MIME 规范的RFC 2047明确禁止)。

However, that is not what is happening in your examples.但是,这不是您的示例中发生的情况。

In your first example, 'Επιστολή εκπαιδευτικο.docx' is too long to be encoded as a single MIME word, so it gets split into 'Επιστολή εκπαιδευτικο.doc' 'x' substrings, which are then encoded separately.在您的第一个示例中, 'Επιστολή εκπαιδευτικο.docx'太长而无法编码为单个 MIME 单词,因此它被拆分为'Επιστολή εκπαιδευτικο.doc' 'x'子字符串,然后单独编码。 This is legal in MIME for long text (though you might have expected Indy to split the text into 'Επιστολή' ' εκπαιδευτικο.doc' instead, or even 'Επιστολή' ' εκπαιδευτικο' '.doc' . That might be a possibility in a future release).这在 MIME 中对于长文本是合法的(尽管您可能希望 Indy 将文本拆分为'Επιστολή' ' εκπαιδευτικο.doc' ,或者甚至是'Επιστολή' ' εκπαιδευτικο' '.doc' 。这可能是一种可能性。未来版本)。 Adjacent MIME words that are separated by only whitespace are meant to be concatenated together without separating whitespace when decoded, thus producing 'Επιστολή εκπαιδευτικο.docx' again.仅由空格分隔的相邻 MIME 词意味着在解码时将连接在一起而不分隔空格,从而再次产生'Επιστολή εκπαιδευτικο.docx' If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικο.doc x' instead?).如果服务器没有这样做,则它的解码器存在缺陷(也许它正在解码为'Επιστολή εκπαιδευτικο.doc x' ?)。

In your second example, 'Επιστολή εκπαιδευτικ.docx' is short enough to be encoded as a single MIME word.在您的第二个示例中, 'Επιστολή εκπαιδευτικ.docx'足够短,可以编码为单个 MIME 单词。

In your third example, 'Επιστολή εκπαιδευτικ .docx' gets split on the second whitespace (not the first) into 'Επιστολή εκπαιδευτικ' ' .docx' substrings, and only the first substring needs to be encoded.在你的第三个例子中, 'Επιστολή εκπαιδευτικ .docx'在第二个空格(不是第一个)上'Επιστολή εκπαιδευτικ'分为'Επιστολή εκπαιδευτικ' ' .docx'子字符串,并且只需要对第一个子字符串进行编码。 This is legal in MIME .这在 MIME 中是合法的 When decoded, the decoded text is meant to be concatenated with the following unencoded text, preserving whitespace between them, thus producing 'Επιστολή εκπαιδευτικ .docx' again.解码时,解码文本旨在与以下未编码文本连接,保留它们之间的空白,从而再次生成'Επιστολή εκπαιδευτικ .docx' If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικ.docx' instead?).如果服务器没有这样做,则它的解码器存在缺陷(也许它正在解码为'Επιστολή εκπαιδευτικ.docx' ?)。

If you run these example filenames through Indy's MIME header encoder/decoder, they do decode properly:如果您通过 Indy 的 MIME 标头编码器/解码器运行这些示例文件名,它们会正确解码:

var
  s: String;
begin
  s := EncodeHeader('Επιστολή εκπαιδευτικο.docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#13#10' =?UTF-8?B?eA==?='
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικο.docx'

  s := EncodeHeader('Επιστολή εκπαιδευτικ.docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?='
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικ.docx' 

  s := EncodeHeader('Επιστολή εκπαιδευτικ .docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx' 
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικ .docx'
end;

So the problem seems to be on the server side decoding, not on Indy's client side encoding.所以问题似乎出在服务器端解码上,而不是出在 Indy 的客户端编码上。

That being said, if you are using a fairly recent version of Indy 10 (Nov 2011 or later), TIdFormDataField has a HeaderEncoding property, which defaults to 'B' (base64) in Unicode environments.话虽如此,如果您使用的是较新版本的 Indy 10(2011 年 11 月或更高版本),则TIdFormDataField具有HeaderEncoding属性,该属性在 Unicode 环境中默认为'B' (base64)。 However, the splitting logic also affects 'Q' (quoted-printable) as well, so that may or may not work for you, either (but you can try it):但是,拆分逻辑也会影响'Q' (引用可打印),因此这可能对您有用,也可能对您不起作用(但您可以尝试):

with Params.AddFile('File', ceFileName.Text, '') do
begin
  ContentTransfer := '';
  HeaderEncoding := 'Q'; // <--- here
  HeaderCharSet := 'utf-8';
end;

Otherwise, a workaround might be to change the value to '8' (8-bit) instead, which effectively disables MIME encoding (but not charset encoding):否则,解决方法可能是将值更改为'8' (8 位),这会有效地禁用 MIME 编码(但不是字符集编码):

with Params.AddFile('File', ceFileName.Text, '') do
begin
  ContentTransfer := '';
  HeaderEncoding := '8'; // <--- here
  HeaderCharSet := 'utf-8';
end;

Just note that if the server is not expecting raw UTF-8 bytes for the filename, you might still run into problems (ie, 'Επιστολή εκπαιδευτικο.docx' being interpreted as 'Επιστολή εκπαιδευτικο.docx' , for instance).请注意,如果服务器不希望文件名使用原始 UTF-8 字节,您可能仍然会遇到问题(即, 'Επιστολή εκπαιδευτικο.docx'被解释为'Επιστολή εκπαιδευτικο.docx' )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM