[英]how to add a path of a github folder too cdqa pdf_converter? Or extract pdf from a github repository in python
I want to convert the pdf files hosted on github, for deployment of my model,rather than a local folder, but for some reason, it does not seem to extract the pdf from the github folder path我想转换托管在 github 上的 pdf 文件,用于部署我的模型,而不是本地文件夹,但由于某种原因,它似乎没有从 github 文件夹路径中提取 pdf
def download_pdf():
import os
import wget
directory = './data5/pdf/'
models_url = [
'',
]
print('\nDownloading PDF files...')
if not os.path.exists(directory):
os.makedirs(directory)
for url in models_url:
wget.download(url=url, out=directory)
download_pdf()
Your example is public thus you can access the page via URL您的示例是公开的,因此您可以通过 URL 访问该页面
https://github.com/mohsinmushtaq-arch/pdf-docs/tree/main/docs https://github.com/mohsinmushtaq-arch/pdf-docs/tree/main/docs
To view using links to view individual docs it needs to be adapted like this by replacing tree with blob !要使用链接查看单个文档,需要通过将 tree 替换为 blob来进行调整!
https://github.com/mohsinmushtaq-arch/pdf-docs/blob/main/docs/FIRE%20AND%20LIFE%20SAFETY%20CODE_Page100.pdf
and to Download the blob needs to be raw !!并下载 blob 需要是原始的!!
https://github.com/mohsinmushtaq-arch/pdf-docs/raw/main/docs/FIRE%20AND%20LIFE%20SAFETY%20CODE_Page100.pdf
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.