简体   繁体   English

如何添加github文件夹的路径cdqa pdf_converter? 或者从python中的github存储库中提取pdf

[英]how to add a path of a github folder too cdqa pdf_converter? Or extract pdf from a github repository in python

I want to convert the pdf files hosted on github, for deployment of my model,rather than a local folder, but for some reason, it does not seem to extract the pdf from the github folder path我想转换托管在 github 上的 pdf 文件,用于部署我的模型,而不是本地文件夹,但由于某种原因,它似乎没有从 github 文件夹路径中提取 pdf

Download pdf files下载pdf文件

def download_pdf():
    import os
    import wget
    directory = './data5/pdf/'
    models_url = [
      '',

    ]

    print('\nDownloading PDF files...')

    if not os.path.exists(directory):
        os.makedirs(directory)

    for url in models_url:
        wget.download(url=url, out=directory)

download_pdf()

Your example is public thus you can access the page via URL您的示例是公开的,因此您可以通过 URL 访问该页面

https://github.com/mohsinmushtaq-arch/pdf-docs/tree/main/docs https://github.com/mohsinmushtaq-arch/pdf-docs/tree/main/docs

To view using links to view individual docs it needs to be adapted like this by replacing tree with blob !使用链接查看单个文档,需要通过将 tree 替换为 blob来进行调整!

https://github.com/mohsinmushtaq-arch/pdf-docs/blob/main/docs/FIRE%20AND%20LIFE%20SAFETY%20CODE_Page100.pdf

在此处输入图片说明

and to Download the blob needs to be raw !!下载 blob 需要是原始的!!

https://github.com/mohsinmushtaq-arch/pdf-docs/raw/main/docs/FIRE%20AND%20LIFE%20SAFETY%20CODE_Page100.pdf

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM