简体   繁体   English

如何使用 Pandas 从私有 GitHub 存储库中读取 excel 数据框?

[英]How to read an excel dataframe from a private GitHub repository using pandas?

I have a working website made using django.我有一个使用 django 制作的工作网站。 I have a private GitHub repository, within it I have excel files which I want to read using pandas read_excel and use on the website.我有一个私有的 GitHub 存储库,里面有我想使用pandas read_excel读取并在网站上使用的 excel 文件。 The reason I have made the repository private is because the data is company specific.我将存储库设为私有的原因是因为数据是特定于公司的。

1) How do I read an excel file using pandas from a private GitHub repository? 1) 如何使用 Pandas 从私有 GitHub 存储库读取 excel 文件? Do I need to set up personal access token?我需要设置个人访问令牌吗?

2) After a user logs in to my website, is there then a way to require a further password when they navigate to try and view their company specific dataframe? 2)用户登录我的网站后,当他们导航以尝试查看他们公司特定的数据框时,是否有办法要求输入更多密码? For example, "User A" will only have access to "Dataframe A", and "User B" will only have access to "Data frame B".例如,“用户 A”将只能访问“数据框 A”,而“用户 B”将只能访问“数据框 B”。

On my local system, the following code works to be able to read the dataframe:在我的本地系统上,以下代码可以读取数据帧:

file_path = 'C:/Users/james/Desktop/projects/path/to/excel/file
df = pd.read_excel(file_path)

For my live website, my code which produces the problem is:对于我的实时网站,产生问题的代码是:

URL_path = 'https://github.com/path/to/excel/file/in/private/repository
df = pd.read_excel(URL_path)

I am able to read the excel files on my local computer, but when I try to read in from my private github, I get the following error, even though I know I am using the correct url:我能够在我的本地计算机上读取 excel 文件,但是当我尝试从我的私人 github 读取时,我收到以下错误,即使我知道我使用的是正确的 url:

urllib.error.HTTPError: HTTP Error 404: Not Found

I verified this by signing out of my github account, and trying to access the github url with my excel in it, it takes me to a 404 not found page since I am not logged in. When I login to my github account, the same URL takes me to the correct page.我通过注销我的 github 帐户来验证这一点,并尝试使用我的 excel 访问 github url,由于我未登录,它会将我带到404 not found页面。当我登录到我的 github 帐户时,同样URL 将我带到正确的页面。

You should need to use a PAO (person access token) from github if the repo is set to private.如果 repo 设置为私有,您应该需要使用来自 github 的 PAO(个人访问令牌)。

You would then need to gather the raw url link to the data and make sure to decode it properly prior to using pandas to read it.然后,您需要收集数据的原始 url 链接,并确保在使用 Pandas 读取它之前正确解码它。

Check out this tutorial here;在此处查看本教程; it's using a csv but the idea is essentially the same:它使用的是 csv,但想法基本相同:

https://medium.com/towards-entrepreneurship/importing-a-csv-file-from-github-in-a-jupyter-notebook-e2c28e7e74a5 https://medium.com/towards-enterprise/importing-a-csv-file-from-github-in-a-jupyter-notebook-e2c28e7e74a5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM