I am trying to run a scraper I found online but receive a ValueError: too many values to unpack on this line of code
k, v = piece.split("=")
This line is part of this function
def format_url(url):
# make sure URLs aren't relative, and strip unnecssary query args
u = urlparse(url)
scheme = u.scheme or "https"
host = u.netloc or "www.amazon.com"
path = u.path
if not u.query:
query = ""
else:
query = "?"
for piece in u.query.split("&"):
k, v = piece.split("=")
if k in settings.allowed_params:
query += "{k}={v}&".format(**locals())
query = query[:-1]
return "{scheme}://{host}{path}{query}".format(**locals())
If you have any input it would be appreciated, thank you.
Instead of parsing the urls yourself, you can use urlparse.parse_qs
function:
>>> from urlparse import urlparse, parse_qs
>>> URL = 'https://someurl.com/with/query_string?i=main&mode=front&sid=12ab&enc=+Hello'
>>> parsed_url = urlparse(URL)
>>> parse_qs(parsed_url.query)
{'i': ['main'], 'enc': [' Hello '], 'mode': ['front'], 'sid': ['12ab']}
( source )
This is due to the fact that one of the piece
s contains two or more '='
characters . In that case you thus return a list of three or more elements. And you cannot assign it to the two values.
You can solve that problem, by splitting at most one '='
by adding an additional parameter to the .split(..)
call:
k, v = piece.split("=")
But now we still do not have guarantees that there is an '='
in the piece
string anyway.
We can however use the urllib.parse
module in python-3.x ( urlparse
in python-2.x ):
purl = url
quer = purl.query
for k,v in quer:
# ...
pass
Now we have decoded the query string as a list of key-value tuples we can process separately. I would advice to build up a URL with the urllib
as well.
You haven't shown any basic debugging: what is piece
at the problem point? If it has more than a single =
in the string, the split
operation will return more than 2 values -- hence your error message.
If you want to split on only the first =
, then use index
to get the location, and grab the slices you need:
pos = piece.index('=')
k = piece[:pos]
v = piece[pos+1:]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.