如何使用 Python 从 Excel 导出中的超链接下载文件

Question

我有一个项目，我想在其中迭代 Excel 共享点导出，该导出在其中一列中具有指向另一个 excel 表单的超链接。 我想使用 Python 遍历该导出并从每一行下载作为超链接的文件，并将这些文件保存到另一个位置。 下面是我所拥有的那种方式的 1/4，但我希望将位于打印超链接中的文件保存在另一个文件夹中，并且我希望 Python 遍历整个文档。

    import openpyxl

    wb = openpyxl.load_workbook(r'O:\Procurement Planning\QA\VSAF_test.xlsx')

    ws = wb['owssvr']

    print(ws.cell(row=2, column=4).hyperlink.target)

更新：

我有以下块，但出现错误

  import requests
  import pandas as pd

def download_file(url):

    # this will grab the filename from the url
    filename = url.split('/')[-1]

print(f'Downloading {filename}')

r = requests.get(url)

with open(filename, 'wb') as output_file:
    output_file.write(r.content)

print('ok')

df = pd.read_excel(r'O:\Procurement Planning\QA\VSAF_test.xlsx')
df['Name'] = 'http://' + df['Name'].astype(str)
file = df['Name']

for url in file:
    download_file(url)

然后我收到这个错误：

HTTPConnectionPool(host='a2consulting_tech_5650_vsaf.xlsm', port=80): Max 
retries exceeded with url: / (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 
0x0000019C39FDBFC8>: Failed to establish a new connection: [Errno 11001] 
getaddrinfo failed'))

更新 2：

我已经得到了下载链接，但是它们似乎没有下载任何东西。 我完成了文件路径，当我在 Jupyter Notebook 中打印它并单击它时，链接有效，但是下载的文件似乎是空白的，并且 Excel 表示文件格式或文件扩展名无效。 请帮忙！

Answer 1

您可以简单地导入下载文件的请求。

为方便起见，我将它包装在一个函数中：

import requests

def download_file(url):

    # this will grab the filename from the url
    filename = url.split('/')[-1]

    print(f'Downloading {filename}')

    r = requests.get(url)

    with open(filename, 'wb') as output_file:
        output_file.write(r.content)

    print('ok')

现在您可以调用该函数为您保存文件。 以下是传递给此函数的 url 的工作示例：

download_file('https://file-examples-com.github.io/uploads/2017/02/file_example_XLSX_10.xlsx')

最后，使用“for 循环”遍历 url 列表，将每个 url 发送到此函数。

urls = ['http://example.com/file1.xlsx', 'http://example.com/file2.xlsx']    

for url in urls:
    download_file(url)

如何使用 Python 从 Excel 导出中的超链接下载文件

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-09-30 20:56:48

如何使用 Python 从 Excel 导出中的超链接下载文件

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-09-30 20:56:48

解决方案1
0 已采纳 2020-09-30 20:56:48