简体   繁体   中英

Python ValueError: too many values to unpack for crawler

I am trying to run a scraper I found online but receive a ValueError: too many values to unpack on this line of code

 k, v = piece.split("=")

This line is part of this function

def format_url(url):
# make sure URLs aren't relative, and strip unnecssary query args
u = urlparse(url)

scheme = u.scheme or "https"
host = u.netloc or "www.amazon.com"
path = u.path

if not u.query:
    query = ""
else:
    query = "?"
    for piece in u.query.split("&"):
        k, v = piece.split("=")
        if k in settings.allowed_params:
            query += "{k}={v}&".format(**locals())
    query = query[:-1]

return "{scheme}://{host}{path}{query}".format(**locals())

If you have any input it would be appreciated, thank you.

Instead of parsing the urls yourself, you can use urlparse.parse_qs function:

>>> from urlparse import urlparse, parse_qs
>>> URL = 'https://someurl.com/with/query_string?i=main&mode=front&sid=12ab&enc=+Hello'
>>> parsed_url = urlparse(URL)
>>> parse_qs(parsed_url.query)
{'i': ['main'], 'enc': [' Hello '], 'mode': ['front'], 'sid': ['12ab']}

( source )

This is due to the fact that one of the piece s contains two or more '=' characters . In that case you thus return a list of three or more elements. And you cannot assign it to the two values.

You can solve that problem, by splitting at most one '=' by adding an additional parameter to the .split(..) call:

k, v = piece.split("=")

But now we still do not have guarantees that there is an '=' in the piece string anyway.

We can however use the urllib.parse module in ( urlparse in ):



purl = url
quer = purl.query

for k,v in quer:
    # ...
    pass

Now we have decoded the query string as a list of key-value tuples we can process separately. I would advice to build up a URL with the urllib as well.

You haven't shown any basic debugging: what is piece at the problem point? If it has more than a single = in the string, the split operation will return more than 2 values -- hence your error message.

If you want to split on only the first = , then use index to get the location, and grab the slices you need:

pos = piece.index('=')
k = piece[:pos]
v = piece[pos+1:]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM