Parsing the file name from list of url links

Question

Ok so I am using a script that is downloading a files from urls listed in a urls.txt.

import urllib.request

with open("urls.txt", "r") as file:
    linkList = file.readlines()
for link in linkList:
    urllib.request.urlretrieve(link)

Unfortunately they are saved as temporary files due to lack of second argument in my urllib.request.urlretrieve function. As there are thousand of links in my text file naming them separately is not an option. The thing is that the name of the file is contained in those links, ie /DocumentXML2XLSDownload.vm?firsttime=true&repengback=true&d‌ocumentId=XXXXXX&xsl‌FileName=rher2xml.xs‌l&outputFileName=XXX‌X_2017_06_25_4.xls where the name of the file comes after outputFileName=

Is there an easy way to parse the file names and then use them in urllib.request.urlretrieve function as secondary argument? I was thinking of extracting those names in excel and placing them in another text file that would be read in similar fashion as urls.txt but I'm not sure how to implement it in Python. Or is there a way to make it exclusively in python without using excel?

Answer 1

You could parse the link on the go.

Example using a regular expression :

import re

with open("urls.txt", "r") as file:
    linkList = file.readlines()
for link in linkList:
    regexp = '((?<=\?outputFileName=)|(?<=\&outputFileName=))[^&]+'
    match = re.search(regexp, link.rstrip())

    if match is None:
        # Make the user aware that something went wrong, e.g. raise exception
        # and/or just print something
        print("WARNING: Couldn't find file name in link [" + link + "]. Skipping...")
    else:
        file_name = match.group(0)
        urllib.request.urlretrieve(link, file_name)

Answer 2

You can use urlparse and parse_qs to get the query string

 from urlparse import urlparse,parse_qs parse = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html?name=Python&version=2') print(parse_qs(parse.query)['name'][0]) # prints Python

Parsing the file name from list of url links

Question

2 answers

solution1
2 ACCPTED 2017-10-18 11:08:24

solution2
1 2017-10-18 10:53:41

Parsing the file name from list of url links

Question

2 answers

solution1 2 ACCPTED 2017-10-18 11:08:24

solution2 1 2017-10-18 10:53:41

solution1
2 ACCPTED 2017-10-18 11:08:24

solution2
1 2017-10-18 10:53:41