简体   繁体   English

将参数添加到 Python 中的给定 URL

[英]Add params to given URL in Python

Suppose I was given a URL.假设我得到了一个 URL。
It might already have GET parameters (eg http://example.com/search?q=question ) or it might not (eg http://example.com/ ).它可能已经有 GET 参数(例如http://example.com/search?q=question ),也可能没有(例如http://example.com/ )。

And now I need to add some parameters to it like {'lang':'en','tag':'python'} .现在我需要向它添加一些参数,例如{'lang':'en','tag':'python'} In the first case I'm going to have http://example.com/search?q=question&lang=en&tag=python and in the second — http://example.com/search?lang=en&tag=python .在第一种情况下,我将使用http://example.com/search?q=question&lang=en&tag=python ,在第二种情况下 - http://example.com/search?lang=en&tag=python

Is there any standard way to do this?有没有标准的方法来做到这一点?

There are a couple of quirks with the urllib and urlparse modules. urlliburlparse模块有几个怪癖。 Here's a working example:这是一个工作示例:

try:
    import urlparse
    from urllib import urlencode
except: # For Python 3
    import urllib.parse as urlparse
    from urllib.parse import urlencode

url = "http://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}

url_parts = list(urlparse.urlparse(url))
query = dict(urlparse.parse_qsl(url_parts[4]))
query.update(params)

url_parts[4] = urlencode(query)

print(urlparse.urlunparse(url_parts))

ParseResult , the result of urlparse() , is read-only and we need to convert it to a list before we can attempt to modify its data. ParseResult urlparse()的结果urlparse() 是只读的,我们需要将其转换为list然后才能尝试修改其数据。

Why为什么

I've been not satisfied with all the solutions on this page ( come on, where is our favorite copy-paste thing? ) so I wrote my own based on answers here.我对这个页面上的所有解决方案都不满意(来吧,我们最喜欢的复制粘贴的东西在哪里? )所以我根据这里的答案写了我自己的。 It tries to be complete and more Pythonic.它试图变得完整和更加 Pythonic。 I've added a handler for dict and bool values in arguments to be more consumer-side ( JS ) friendly, but they are yet optional, you can drop them.我在参数中为dictbool值添加了一个处理程序,以便对消费者( JS )更加友好,但它们仍然是可选的,您可以删除它们。

How it works这个怎么运作

Test 1: Adding new arguments, handling Arrays and Bool values:测试 1:添加新参数,处理数组和布尔值:

url = 'http://stackoverflow.com/test'
new_params = {'answers': False, 'data': ['some','values']}

add_url_params(url, new_params) == \
    'http://stackoverflow.com/test?data=some&data=values&answers=false'

Test 2: Rewriting existing args, handling DICT values:测试 2:重写现有参数,处理 DICT 值:

url = 'http://stackoverflow.com/test/?question=false'
new_params = {'question': {'__X__':'__Y__'}}

add_url_params(url, new_params) == \
    'http://stackoverflow.com/test/?question=%7B%22__X__%22%3A+%22__Y__%22%7D'

Talk is cheap.谈话很便宜。 Show me the code.给我看代码。

Code itself.代码本身。 I've tried to describe it in details:我试图详细描述它:

from json import dumps

try:
    from urllib import urlencode, unquote
    from urlparse import urlparse, parse_qsl, ParseResult
except ImportError:
    # Python 3 fallback
    from urllib.parse import (
        urlencode, unquote, urlparse, parse_qsl, ParseResult
    )


def add_url_params(url, params):
    """ Add GET params to provided URL being aware of existing.

    :param url: string of target URL
    :param params: dict containing requested params to be added
    :return: string with updated URL

    >> url = 'http://stackoverflow.com/test?answers=true'
    >> new_params = {'answers': False, 'data': ['some','values']}
    >> add_url_params(url, new_params)
    'http://stackoverflow.com/test?data=some&data=values&answers=false'
    """
    # Unquoting URL first so we don't loose existing args
    url = unquote(url)
    # Extracting url info
    parsed_url = urlparse(url)
    # Extracting URL arguments from parsed URL
    get_args = parsed_url.query
    # Converting URL arguments to dict
    parsed_get_args = dict(parse_qsl(get_args))
    # Merging URL arguments dict with new params
    parsed_get_args.update(params)

    # Bool and Dict values should be converted to json-friendly values
    # you may throw this part away if you don't like it :)
    parsed_get_args.update(
        {k: dumps(v) for k, v in parsed_get_args.items()
         if isinstance(v, (bool, dict))}
    )

    # Converting URL argument to proper query string
    encoded_get_args = urlencode(parsed_get_args, doseq=True)
    # Creating new parsed result object based on provided with new
    # URL arguments. Same thing happens inside of urlparse.
    new_url = ParseResult(
        parsed_url.scheme, parsed_url.netloc, parsed_url.path,
        parsed_url.params, encoded_get_args, parsed_url.fragment
    ).geturl()

    return new_url

Please be aware that there may be some issues, if you'll find one please let me know and we will make this thing better请注意,可能存在一些问题,如果您发现了问题,请告诉我,我们会将这件事做得更好

You want to use URL encoding if the strings can have arbitrary data (for example, characters such as ampersands, slashes, etc. will need to be encoded).如果字符串可以包含任意数据(例如,需要对与符号、斜杠等字符进行编码),则您希望使用 URL 编码。

Check out urllib.urlencode:查看 urllib.urlencode:

>>> import urllib
>>> urllib.urlencode({'lang':'en','tag':'python'})
'lang=en&tag=python'

In python3:在python3中:

from urllib import parse
parse.urlencode({'lang':'en','tag':'python'})

Outsource it to the battle tested requests library .将其外包给经过实战测试的请求库

This is how I will do it:这就是我将如何做到的:

from requests.models import PreparedRequest
url = 'http://example.com/search?q=question'
params = {'lang':'en','tag':'python'}
req = PreparedRequest()
req.prepare_url(url, params)
print(req.url)

You can also use the furl module https://github.com/gruns/furl您还可以使用 furl 模块https://github.com/gruns/furl

>>> from furl import furl
>>> print furl('http://example.com/search?q=question').add({'lang':'en','tag':'python'}).url
http://example.com/search?q=question&lang=en&tag=python

If you are using the requests lib :如果您使用的是请求库

import requests
...
params = {'tag': 'python'}
requests.get(url, params=params)

Based on this answer, one-liner for simple cases (Python 3 code):基于答案,简单情况的单行(Python 3 代码):

from urllib.parse import urlparse, urlencode


url = "https://stackoverflow.com/search?q=question"
params = {'lang':'en','tag':'python'}

url += ('&' if urlparse(url).query else '?') + urlencode(params)

or:或者:

url += ('&', '?')[urlparse(url).query == ''] + urlencode(params)

I find this more elegant than the two top answers:我发现这比两个最重要的答案更优雅:

from urllib.parse import urlencode, urlparse, parse_qs

def merge_url_query_params(url: str, additional_params: dict) -> str:
    url_components = urlparse(url)
    original_params = parse_qs(url_components.query)
    # Before Python 3.5 you could update original_params with 
    # additional_params, but here all the variables are immutable.
    merged_params = {**original_params, **additional_params}
    updated_query = urlencode(merged_params, doseq=True)
    # _replace() is how you can create a new NamedTuple with a changed field
    return url_components._replace(query=updated_query).geturl()

assert merge_url_query_params(
    'http://example.com/search?q=question',
    {'lang':'en','tag':'python'},
) == 'http://example.com/search?q=question&lang=en&tag=python'

The most important things I dislike in the top answers (they are nevertheless good):最重要的答案中我不喜欢的最重要的事情(它们仍然很好):

  • Łukasz: having to remember the index at which the query is in the URL components Łukasz:必须记住query在 URL 组件中的索引
  • Sapphire64: the very verbose way of creating the updated ParseResult Sapphire64:创建更新的ParseResult的非常冗长的方式

What's bad about my response is the magically looking dict merge using unpacking, but I prefer that to updating an already existing dictionary because of my prejudice against mutability.我的回答的坏处是使用解包进行的看起来很神奇的dict合并,但由于我对可变性的偏见,我更喜欢更新已经存在的字典。

Yes: use urllib .是:使用urllib

From the examples in the documentation:从文档中的示例

>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params)
>>> print f.geturl() # Prints the final URL with parameters.
>>> print f.read() # Prints the contents

I liked Łukasz version, but since urllib and urllparse functions are somewhat awkward to use in this case, I think it's more straightforward to do something like this:我喜欢 Łukasz 版本,但由于 urllib 和 urlparse 函数在这种情况下使用起来有些尴尬,我认为这样做更直接:

params = urllib.urlencode(params)

if urlparse.urlparse(url)[4]:
    print url + '&' + params
else:
    print url + '?' + params

Use the various urlparse functions to tear apart the existing URL, urllib.urlencode() on the combined dictionary, then urlparse.urlunparse() to put it all back together again.使用各种urlparse函数来拆分现有的 URL,在组合字典上使用urllib.urlencode() ,然后使用urlparse.urlunparse()将它们重新组合在一起。

Or just take the result of urllib.urlencode() and concatenate it to the URL appropriately.或者只是获取urllib.urlencode()的结果并将其适当地连接到 URL。

Yet another answer:还有一个答案:

def addGetParameters(url, newParams):
    (scheme, netloc, path, params, query, fragment) = urlparse.urlparse(url)
    queryList = urlparse.parse_qsl(query, keep_blank_values=True)
    for key in newParams:
        queryList.append((key, newParams[key]))
    return urlparse.urlunparse((scheme, netloc, path, params, urllib.urlencode(queryList), fragment))

python3 , self explanatory I guess python3 ,我想不言自明

from urllib.parse import urlparse, urlencode, parse_qsl

url = 'https://www.linkedin.com/jobs/search?keywords=engineer'

parsed = urlparse(url)
current_params = dict(parse_qsl(parsed.query))
new_params = {'location': 'United States'}
merged_params = urlencode({**current_params, **new_params})
parsed = parsed._replace(query=merged_params)

print(parsed.geturl())
# https://www.linkedin.com/jobs/search?keywords=engineer&location=United+States

Here is how I implemented it.这是我如何实施它。

import urllib

params = urllib.urlencode({'lang':'en','tag':'python'})
url = ''
if request.GET:
   url = request.url + '&' + params
else:
   url = request.url + '?' + params    

Worked like a charm.像魅力一样工作。 However, I would have liked a more cleaner way to implement this.但是,我希望有一种更简洁的方式来实现这一点。

Another way of implementing the above is put it in a method.实现上述的另一种方法是将其放入方法中。

import urllib

def add_url_param(request, **params):
   new_url = ''
   _params = dict(**params)
   _params = urllib.urlencode(_params)

   if _params:
      if request.GET:
         new_url = request.url + '&' + _params
      else:
         new_url = request.url + '?' + _params
   else:
      new_url = request.url

   return new_ur

In python 2.5在蟒蛇 2.5

import cgi
import urllib
import urlparse

def add_url_param(url, **params):
    n=3
    parts = list(urlparse.urlsplit(url))
    d = dict(cgi.parse_qsl(parts[n])) # use cgi.parse_qs for list values
    d.update(params)
    parts[n]=urllib.urlencode(d)
    return urlparse.urlunsplit(parts)

url = "http://stackoverflow.com/search?q=question"
add_url_param(url, lang='en') == "http://stackoverflow.com/search?q=question&lang=en"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM