简体   繁体   English

将包含unicode的字符串传递给RESTful API

[英]Passing a string containing unicode to a RESTful API

I am firing off 3 separate queries against a RESTful API (I am using python/urllib2/pandas) each query containing one of the following string variants: 我针对RESTful API触发了3个单独的查询(我使用的是python / urllib2 / pandas),每个查询包含以下字符串变体之一:

(1) 'Caveolin-1 suppresses Human Immunodeficiency Virus-1 replication by inhibiting acetylation of NF-\xce\xbaB'

(2) 'Caveolin-1 suppresses Human Immunodeficiency virus-1 replication by inhibiting acetylation of NF-κB'

(3) 'Caveolin-1 suppresses Human Immunodeficiency virus-1 replication by inhibiting acetylation of NF

Outcomes are: 结果是:

(1) doesnt return any results (when done programmatically from python) (1)不返回任何结果(从python以编程方式完成时)

(2) Works and returns the expected result - the matching record (query fired manually using a Chrome plugin for making RESTful API calls and just paste the string as is into the appropriate part of the API call) (2)工作并返回预期结果-匹配的记录(使用Chrome插件手动触发的查询以进行RESTful API调用,然后将字符串原样粘贴到API调用的相应部分中)

(3) works the same as (2) (3)与(2)相同

Since I have the source data and am doing (1) programmatically from python by reading string from a dataframe, is there any way of doing something (have no idea what) with the unicode characters in my source data (I am guessing thats what '\\xbaB' etc are) to make them passable to the API? 由于我有源数据,并且正在通过从数据帧中读取字符串从python中以编程方式进行操作(1),有没有办法对源数据中的unicode字符做任何事情(不知道该做什么)(我猜这就是“ \\ xbaB'等)以使其可传递给API? '\\xce\\xbaB' seems to be the encoding for 'κB'based on the above. 基于上述情况,“ \\ xce \\ xbaB”似乎是“κB”的编码。

Or this is this something I am going to have to look at API documentation for (which for this bit doesnt exist I dont think...). 或这就是我将不得不查看的API文档(对此我不认为...不存在)。

If this is hard/its easier - whats the best way to just get rid of any unicode characters from the string before passing the query (ie fallback to (3))? 如果这很难/很容易-在传递查询之前从字符串中删除所有Unicode字符的最佳方法是什么(即回退到(3))?

Thanks in advance! 提前致谢!

REF: from python I am executing the following to come with the API REF:从python中,我正在执行以下操作以附带API

 api_call = 'http://some_api/index:ABCDE?query=title(' + str(title_string) + ')' headers = {'APIKey': API_key, 'accept':'text/xml, application/atom+xml'} 
request = urllib2.Request(api_call, headers = headers, )
response = urllib2.urlopen(request,'' , 30)

return response.read()

\\xce and \\xba are characters with the hex values ce and ba respectively. \\ xce和\\ xba是分别具有十六进制值ce和ba的字符。 Without knowing more about how you're talking to the API or what it's expecting I would think you could do something like this to make the string passable: 在不了解有关您如何使用API​​或如何期望API的更多信息的情况下,我认为您可以执行以下操作以使字符串可传递:

>>> urllib.quote('an Immunodeficiency Virus-1 replication by inhibiting acetylation of NF-\xce\xbaB')
'an%20Immunodeficiency%20Virus-1%20replication%20by%20inhibiting%20acetylation%20of%20NF-%CE%BAB'

EDIT: 编辑:

Normally in python this is how I would add parameters to the URL: 通常在python中,这就是我将参数添加到URL的方式:

params = {'query' : 'title(' + title_string + ')'}
api_call = 'http://some_api/index:ABCDE?' + urllib.urlencode(params)

So I would lean towards that instead of my earlier urllib.quote suggestion (which I think would be applicable if title_string was part of the path), but I'm not sure if it's enough with the hex values in title_string. 因此,我倾向于使用该方法,而不是我以前的urllib.quote建议(我认为如果title_string是路径的一部分,该建议将适用),但是我不确定title_string中的十六进制值是否足够。 I think it will depend on how it's being handled on the server side. 我认为这将取决于服务器端的处理方式。

尝试将“ \\”转换为“ \\\\”,因为“ \\ x”表示以十六进制数字开头的十六进制字符代码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM