简体   繁体   English

使用Indy 10.5.8.0和Delphi XE2的UTF-8 GET

[英]UTF-8 GET using Indy 10.5.8.0 and Delphi XE2

I'm writing my first Unicode application with Delphi XE2, and I've stumbled upon an issue with GET requests to a Unicode URL. 我正在用Delphi XE2编写我的第一个Unicode应用程序,我偶然发现了对Unicode URL的GET请求的问题。

In short, it's a routine in a MP3 tagging application that takes a track title and an artist, and queries Last.FM for the corresponding album, track number, and genre. 简而言之,它是MP3标签应用程序中的例程,它采用曲目标题和艺术家,并查询Last.FM以查找相应的专辑,曲目编号和流派。

I have the following code: 我有以下代码:

function GetMP3Info(artist, track: string) : TMP3Data //<---(This is a record)
var
  TrackTitle,
  ArtistTitle : WideString;
  webquery    : WideString;

[....]

WebQuery := UTF8Encode('http://ws.audioscrobbler.com/2.0/?method=track.getcorrection&api_key=' + apikey + '&artist=' + artist + '&track=' + track);

//[processing the result in the web query, getting the correction for the artist and title]

// eg: for artist := Bucovina and track := Mestecanis, the corrected values are 
//ArtistTitle := Bucovina;
// TrackTitle := Mestecăniș;

//Now here is the tricky part:

webquery := UTF8Encode('http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=' + apikey + '&artist=' + unescape(ArtistTitle) + '&track=' + unescape(TrackTitle)); 
//the unescape function replaces spaces (' ') with '+' to comply with the last.fm requests

[some more processing]

end;

The webquery in a TMemo looks just right: TMemo的webquery看起来恰到好处:

http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestecăniș http://ws.audioscrobbler.com/2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestecăniş

Yet, when I try to send a GET request to the webquery using TIdHTTP (with the ContentEncoding property set to 'UTF-8' ), I see in Wireshark that TIdHTTP is GET 'ing data using an ANSI request URL: 然而,当我尝试发送一个GET请求,使用webquery TIdHTTP (与ContentEncoding设置属性'UTF-8' ),我在Wireshark的看到TIdHTTPGET “荷兰国际集团使用ANSI请求URL数据:

/2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestec?ni? /2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestec?ni?

Here are the full headers for the GET requests and responses: 以下是GET请求和响应的完整标头:

GET /2.0/?method=track.getInfo&api_key=e5565002840xxxxxxxxxxxxxx23b98ad&artist=Bucovina&track=Mestec?ni? HTTP/1.1
Content-Encoding: UTF-8
Host: ws.audioscrobbler.com
Accept: text/html, */*
Accept-Encoding: identity
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.23) Gecko/20110920 Firefox/3.6.23 SearchToolbar/1.22011-10-16 20:20:07

HTTP/1.0 400 Bad Request
Date: Tue, 09 Oct 2012 20:46:31 GMT
Server: Apache/2.2.22 (Unix)
X-Web-Node: www204
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: POST, GET, OPTIONS
Access-Control-Max-Age: 86400
Cache-Control: max-age=10
Expires: Tue, 09 Oct 2012 20:46:42 GMT
Content-Length: 114
Connection: close
Content-Type: text/xml; charset=utf-8;

<?xml version="1.0" encoding="utf-8"?>
<lfm status="failed">
<error code="6">
    Track not found
</error>
</lfm>

The question that puzzles me is, am I overseeing anything related to setting the properties of the TIdHTTP component? 困扰我的问题是,我是否在监督与设置TIdHTTP组件属性相关的任何事情? How can I stop the well-formatted URL I'm composing in the application from getting sent to the server in the wrong format? 如何阻止我在应用程序中编写的格式正确的URL以错误的格式发送到服务器?

To get the XML response from the track.getCorrection function you can use something like this: 要从track.getCorrection函数获取XML响应,您可以使用以下内容:

uses
  IdHTTP, IdURI;

function GetMusicDataXML(const AArtist, ATrack: string): string;
var
  URL: string;
  IdHTTP: TIdHTTP;
const
  APIKey = '1a3d8080e427f4dxxxxxxxxxxxxxxxxx';
begin
  Result := '';
  IdHTTP := TIdHTTP.Create;
  try
    URL := TIdURI.URLEncode('http://ws.audioscrobbler.com/2.0/?method=track.getcorrection&api_key=' + APIKey + '&artist=' + AArtist + '&track=' + ATrack);
    Result := IdHTTP.Get(URL);
  finally
    IdHTTP.Free;
  end;
end;
 var ... webquery : WideString; ... WebQuery := UTF8Encode('http://ws.audioscrobbler.com/2.0/?method=track.getcorrection&api_key=' + apikey + '&artist=' + artist + '&track=' + track); 

This does not do what you think it does. 这不符合你的想法。 In XE2, UTF8Encode() returns a UTF-8 encoded RawByteString , which you are then assigning to a WideString . 在XE2中, UTF8Encode()返回一个UTF-8编码的RawByteString ,然后将其分配给WideString The RTL will decode the UTF-8 data back to a UTF-16 string. RTL将UTF-8数据解码回UTF-16字符串。 When you pass that string to TIdHTTP.Get() , it will convert it to ASCII when the actual HTTP request is formatted, losing any non-ASCII characters. 将该字符串传递给TIdHTTP.Get() ,它会在格式化实际HTTP请求时将其转换为ASCII,从而丢失任何非ASCII字符。

As @TLama said, you have to encode the URL using TIdURI before passing it to TIdHTTP . 正如@TLama所说,在将URL传递给TIdHTTP之前,您必须使用TIdURI对URL进行编码。 TIdURI will encode Unicode characters as UTF-8 (by default - you can specify the encoding if needed) and then encode the resulting data in an ASCII-compatible format that TIdHTTP will not lose. TIdURI会将Unicode字符编码为UTF-8(默认情况下 - 您可以根据需要指定编码),然后将结果数据编码为TIdHTTP不会丢失的ASCII兼容格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM