简体   繁体   English

下载直接下载到指定文件夹

[英]Download direct download to specific folder

I have this code where I am using beautifulsoup to get a url for a direct download link to a pdf and save it into a specific directory.我有这个代码,我正在使用 beautifulsoup 获取 url 以获取指向 pdf 的直接下载链接并将其保存到特定目录中。

Getting the url for the link works, it is just getting it to download and save in a directory that is causing problems.获取 url 的链接工作,它只是让它下载并保存在导致问题的目录中。 I have searched through several sources and finally tried urllib.urlretrieve but this is not working.我搜索了几个来源,最后尝试了 urllib.urlretrieve 但这不起作用。

Could I get assistance, please.请问我能得到帮助吗? Edit: Main thing I am trying to do is to get the code to download the direct link and not having to manually click it.编辑:我想做的主要事情是让代码下载直接链接,而不必手动单击它。

If the url is needed for helping here it is;如果这里需要 url 来提供帮助; https://www.odfl.com/us/en/resources/tariffs/tariff-odfl-100-0.html As I am parsing through an xml, as there are several other urls that require different code. https://www.odfl.com/us/en/resources/tariffs/tariff-odfl-100-0.html As I am parsing through an xml, as there are several other urls that require different code.

import sys
import os
import re
from bs4 import BeautifulSoup
from urllib.parse import urlparse
import urllib
import xml.etree.ElementTree as ET

def ODFL100(tariff_id):

    try:

        pdf_path = ParseXML(tariff_id)
        r = requests.get(pdf_path,stream = True)
        download_path = ParseXML(tariff_id,1)

        link_list = []
        pdf_link = ""
        pdf_found = 0

        r = requests.get(pdf_path)
        soup = BeautifulSoup(r.text, "html.parser")

        base = urlparse(pdf_path)
        for i in soup.find_all('a'):
            if pdf_found == 0:
                current_link = i.get('href')
            else:
                break
            if current_link.endswith('pdf'):
                link_list.append(base.scheme+"://"+base.netloc + current_link)
                for l in link_list:
                    pdf_link = l
                    if "rates-and-tariffs"  in pdf_link:
                        pdf_found = 1
        urllib.urlretrieve(pdf_link, download_path)

    except Exception as error:
        error_message = "A " + str(error)

I don't quite understand what you want to achieve.我不太明白你想要达到什么目的。 But here is an example code that will download a pdf file from a specified page.但这里有一个示例代码,它将从指定页面下载 pdf 文件。

import requests
from bs4 import BeautifulSoup


url = 'https://www.odfl.com/us/en/resources/tariffs/tariff-odfl-100-0.html'
response = requests.get(url)
link = 'https://www.odfl.com' + BeautifulSoup(response.text, 'lxml').find('a', class_='cmp-form-button').get('href')
with open(requests.utils.unquote((link.split('/')[-1])), 'wb') as f:
    f.write(requests.get(link).content)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM