简体   繁体   中英

File upload fails, when posting with Indy and filename contains Greek characters

I am trying to implement a POST to a web service. I need to send a file whose type is variable ( .docx , .pdf , .txt ) along with a JSON formatted string.

I have manage to post files successfully with code similar to the following:

procedure DoRequest;
var
  Http: TIdHTTP;
  Params: TIdMultipartFormDataStream;
  RequestStream, ResponseStream: TStringStream;
  JRequest, JResponse: TJSONObject;
  url: string;
begin
  url := 'some_custom_service'

  JRequest := TJSONObject.Create;
  JResponse := TJSONObject.Create;
  try
    JRequest.AddPair('Pair1', 'Value1');
    JRequest.AddPair('Pair2', 'Value2');
    JRequest.AddPair('Pair3', 'Value3');

    Http := TIdHTTP.Create(nil);           
    ResponseStream := TStringStream.Create;
    RequestStream := TStringStream.Create(UTF8Encode(JRequest.ToString));
    try   
      Params := TIdMultipartFormDataStream.Create;
      Params.AddFile('File', ceFileName.Text, '').ContentTransfer := '';
      Params.AddFormField('Json', 'application/json', '', RequestStream);

      Http.Post(url, Params, ResponseStream);
      JResponse := TJSONObject.ParseJSONValue(ResponseStream.DataString) as TJSONObject;
    finally    
      RequestStream.Free;
      ResponseStream.Free;
      Params.Free;
      Http.Free;
    end;
  finally
    JRequest.Free;
    JResponse.Free;
  end;
end;

The problem appears when I try to send a file that contains Greek characters and spaces in the filename. Sometimes it fails and sometimes it succeeds.

After a lot of research, I notice that the POST header is encoded by Indy's TIdFormDataField class using the EncodeHeader() function. When the post fails, the encoded filename in the header is split, compared to the successful post where is not split.

For example :

  • Επιστολή εκπαιδευτικο.docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#$D#$A' =?UTF-8?B?eA==?= , which fails.
  • Επιστολή εκπαιδευτικ.docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?= , which succeeds.
  • Επιστολή εκπαιδευτικ .docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx , which fails.

I have tried to change the encoding of the filename, the AContentType of the AddFile() procedure, and the ContentTransfer , but none of those change the behavior, and I still get errors when the encoded filename is split.

Is this some kind of bug, or am I missing something?

My code works for every case except those I described above.

I am using Delphi XE3 with Indy10.

EncodeHeader() does have some known issues with Unicode strings:

EncodeHeader() needs to take codeunits into account when splitting data between adjacent encoded-words

Basically, an MIME-encoded word cannot be more than 75 characters in length, so long text gets split up. But when encoding a long Unicode string, any given Unicode character may be charset-encoded using 1 or more bytes, and EncodeHeader() does not yet avoid erroneously splitting a multi-byte character between two individual bytes into separate encoded words (which is illegal and explicitly forbidden by RFC 2047 of the MIME spec).

However, that is not what is happening in your examples.

In your first example, 'Επιστολή εκπαιδευτικο.docx' is too long to be encoded as a single MIME word, so it gets split into 'Επιστολή εκπαιδευτικο.doc' 'x' substrings, which are then encoded separately. This is legal in MIME for long text (though you might have expected Indy to split the text into 'Επιστολή' ' εκπαιδευτικο.doc' instead, or even 'Επιστολή' ' εκπαιδευτικο' '.doc' . That might be a possibility in a future release). Adjacent MIME words that are separated by only whitespace are meant to be concatenated together without separating whitespace when decoded, thus producing 'Επιστολή εκπαιδευτικο.docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικο.doc x' instead?).

In your second example, 'Επιστολή εκπαιδευτικ.docx' is short enough to be encoded as a single MIME word.

In your third example, 'Επιστολή εκπαιδευτικ .docx' gets split on the second whitespace (not the first) into 'Επιστολή εκπαιδευτικ' ' .docx' substrings, and only the first substring needs to be encoded. This is legal in MIME . When decoded, the decoded text is meant to be concatenated with the following unencoded text, preserving whitespace between them, thus producing 'Επιστολή εκπαιδευτικ .docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικ.docx' instead?).

If you run these example filenames through Indy's MIME header encoder/decoder, they do decode properly:

var
  s: String;
begin
  s := EncodeHeader('Επιστολή εκπαιδευτικο.docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#13#10' =?UTF-8?B?eA==?='
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικο.docx'

  s := EncodeHeader('Επιστολή εκπαιδευτικ.docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?='
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικ.docx' 

  s := EncodeHeader('Επιστολή εκπαιδευτικ .docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx' 
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικ .docx'
end;

So the problem seems to be on the server side decoding, not on Indy's client side encoding.

That being said, if you are using a fairly recent version of Indy 10 (Nov 2011 or later), TIdFormDataField has a HeaderEncoding property, which defaults to 'B' (base64) in Unicode environments. However, the splitting logic also affects 'Q' (quoted-printable) as well, so that may or may not work for you, either (but you can try it):

with Params.AddFile('File', ceFileName.Text, '') do
begin
  ContentTransfer := '';
  HeaderEncoding := 'Q'; // <--- here
  HeaderCharSet := 'utf-8';
end;

Otherwise, a workaround might be to change the value to '8' (8-bit) instead, which effectively disables MIME encoding (but not charset encoding):

with Params.AddFile('File', ceFileName.Text, '') do
begin
  ContentTransfer := '';
  HeaderEncoding := '8'; // <--- here
  HeaderCharSet := 'utf-8';
end;

Just note that if the server is not expecting raw UTF-8 bytes for the filename, you might still run into problems (ie, 'Επιστολή εκπαιδευτικο.docx' being interpreted as 'Επιστολή εκπαιδευτικο.docx' , for instance).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM