Hi I want to download delimited text which is hosted on a HTML Link. (The link is accessible on a Private network only, so can't share here).
In R, following function solves the purpose (all other functions gave "Unauthorized access" or "401" error)
url = 'https://dw-results.ansms.com/dw-platform/servlet/results? job_id=13802737&encoding=UTF8&mimeType=plain'
download.file(url, "~/insights_dashboard/testing_file.tsv")
a = read.csv("~/insights_dashboard/testing_file.tsv",header = T,stringsAsFactors = F,sep='\t')
I want to do the same thing in Python, for which I used:
(A)urllib and requests.get()
import urllib.request
url_get = requests.get(url, verify=False)
urllib.request.urlretrieve(url_get, 'C:\\Users\\cssaxena\\Desktop\\24.tsv')
(B)requests.get() and read.html
url='https://dw-results.ansms.com/dw-platform/servlet/results? job_id=13802737&encoding=UTF8&mimeType=plain'
s = requests.get(url, verify=False)
a = pd.read_html(io.StringIO(s.decode('utf-8')))
(C) Using wget:
import wget
url = 'https://dw-results.ansms.com/dw-platform/servlet/results? job_id=13802737&encoding=UTF8&mimeType=plain'
wget.download(url,--auth-no-challenge, 'C:\\Users\\cssaxena\\Desktop\\24.tsv')
OR
wget --server-response -owget.log "https://dw-results.ansms.com/dw-platform/servlet/results? job_id=13802737&encoding=UTF8&mimeType=plain"
NOTE: The URL doesn't asks for any credentials and it is accessible by browser and able to download using R with download.file. I am looking for a solution in Python
def geturls(path):
yy=open(path,'rb').read()
yy="".join(str(yy))
yy=yy.split('<a')
out=[]
for d in yy:
z=d.find('href="')
if z>-1:
x=d[z+6:len(d)]
r=x.find('"')
x=x[:r]
x=x.strip(' ./')
#
if (len(x)>2) and (x.find(";")==-1):
out.append(x.strip(" /"))
out=set(out)
return(out)
pg="./test.html"# your html
url=geturls(pg)
print(url)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.