简体   繁体   English

使用Python编码-将非英文字符转换为URL

[英]Encoding in Python - non-English characters into a URL

I'm trying bit for bit to write a geocoding script. 我正在一点一点地尝试编写地理编码脚本。 There is a Danish (official and free) web service, where I enter an address in the URL and get a json file with all needed info. 有一个丹麦的(官方和免费的)Web服务,在这里我在URL中输入一个地址,并获得包含所有所需信息的json文件。

I can't find the right way to translate my Danish characters (æ,ø,å) when they go into an URL. 当丹麦语字符(æ,ø,å)进入URL时,我找不到正确的翻译方法。 In the example I have included two different urls (containing the address info). 在示例中,我包含了两个不同的URL(包含地址信息)。 One – where the street is 'Byvej' works fine, and I get the result I expect printed out in IDLE. 一个-街道在'Byvej'的地方工作正常,我得到的结果期望在IDLE中打印出来。 (And I can get the lat/long too). (我也可以得到经纬度)。

The other, where the street is 'Bispegårdsvej' gives nothing in IDLE. 另一条街道为“Bispegårdsvej”,在IDLE中没有任何显示。 The returned list is empty. 返回的列表为空。 The url works fine in a browser, and I know, I need to add something to the script, I just can't find anything that works. 该网址在浏览器中可以正常运行,而且我知道,我需要向脚本中添加一些内容,但找不到任何有效的内容。

I'm using ActivePython 2.7.2.5 Thanks, Tommy 我正在使用ActivePython 2.7.2.5,谢谢,汤米(Tommy)

# -*- coding: cp1252 -*-
import urllib2
import json


#url='http://geo.oiorest.dk/adresser.json?postnr=4682&vejnavn=Byvej&husnr=31'
url='http://geo.oiorest.dk/adresser.json?postnr=4320&vejnavn=Bispegårdsvej&husnr=2'

try:
    data = urllib2.urlopen(url).read()
    adresser = json.loads(data)

    for adresse in adresser:
        print "%s %s, %s %s" % \
            (adresse['vejnavn']['navn'],
             adresse['husnr'],
             adresse['postnummer']['nr'],
             adresse['postnummer']['navn'])

except urllib2.HTTPError, e:
    print "HTTP error: %d" % e.code
except urllib2.URLError, e:
    print "Network error: %s" % e.reason.args[1]    

You need to encode the social characters with percent encoding, also known as URL encode. 您需要使用百分比编码(也称为URL编码)对社交字符进行编码。 After percent encoding, the URL should like like this: 经过百分比编码后,URL应如下所示:

http://geo.oiorest.dk/adresser.json?postnr=4320&vejnavn=Bispeg%C3%A5rdsvej&husnr=2

Web services that complies with the IRI to URI mapping defined in RFC 3987 would use utf-8 for encoding after character normalization, but you should need to check the documentation the service to be sure what encoding to use. 符合RFC 3987中定义的IRI到URI映射的Web服务将在字符归一化后使用utf-8进行编码,但是您需要检查该服务的文档以确保使用哪种编码。

Python has urllib.quote() in the standard library to do percent encoding from a string and urllib.urlencode() in the standard library to do percent encoding from a dictionary or an iterable of two elements tuples to produce a string for the query parameter. Python在标准库中具有urllib.quote()来对字符串进行百分比编码,在标准库中具有urllib.urlencode()来对字典或两个元素元组的可迭代项进行百分比编码,以生成用于查询参数的字符串。

You'll have to encode special characters properly, as eg urlencode does: 您必须正确编码特殊字符,例如urlencode可以:

In[16]: urllib.urlencode([('postnr',4320),('vejnavn', 'Bispegårdsvej'), ('husnr',2)])
Out[16]: 'postnr=4320&vejnavn=Bispeg%C3%A5rdsvej&husnr=2'

If you then prepend the base url to this string, this should work (I at least tried it in the browser). 如果您随后将基本url放在此字符串之前,这应该可以工作(我至少在浏览器中尝试过)。

If you're open to get a third party package, requests would be a popular choice. 如果您愿意获得第三方软件包,则请求将是一个不错的选择。 It would simplify things to: 它将简化为:

import requests
response = requests.get('http://geo.oiorest.dk/adresser.json',
                        params = dict(postnr=4320,
                                      vejnavn='Bispegårdsvej',
                                      husnr=2))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM