简体   繁体   English

如何使用 Pandas 从 GitHub 读取 CSV 文件

[英]How to read CSV file from GitHub using pandas

Im trying to read CSV file thats on github with Python using pandas> i have looked all over the web, and I tried some solution that I found on this website, but they do not work.我正在尝试使用 Pandas 使用 Python 读取 github 上的 CSV 文件What am I doing wrong?我究竟做错了什么?

I have tried this:我试过这个:

import pandas as pd

url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv'
df = pd.read_csv(url,index_col=0)
#df = pd.read_csv(url)

print(df.head(5))

You should provide URL to raw content.您应该提供原始内容的 URL。 Try using this:尝试使用这个:

import pandas as pd

url = 'https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/all/all.csv'
df = pd.read_csv(url, index_col=0)
print(df.head(5))

Output:输出:

               alpha-2           ...            intermediate-region-code
name                             ...                                    
Afghanistan         AF           ...                                 NaN
Åland Islands       AX           ...                                 NaN
Albania             AL           ...                                 NaN
Algeria             DZ           ...                                 NaN
American Samoa      AS           ...                                 NaN

Add ?raw=true at the end of the GitHub URL to get the raw file link.在 GitHub URL 末尾添加?raw=true以获取原始文件链接。

In your case,在你的情况下,

import pandas as pd
url = 'https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/blob/master/all/all.csv?raw=true'
df = pd.read_csv(url,index_col=0)
#df = pd.read_csv(url)

print(df.head(5))

Note : This works only with GitHub links and not with GitLab or Bitbucket links.注意:这仅适用于 GitHub 链接,不适用于 GitLab 或 Bitbucket 链接。

I recommend to either use pandas as you tried to and others here have explained, or depending on the application, the python csv-handler CommaSeperatedPython , which is a minimalistic wrapper for the native csv-library.我建议要么像你尝试的那样使用熊猫,其他人在这里已经解释过,或者根据应用程序,python csv-handler CommaSeperatedPython ,它是原生 csv-library 的简约包装器。

The library returns the contents of a file as a 2-Dimensional String-Array.该库以二维字符串数组的形式返回文件的内容。 It's is in its very early stage though, so if you want to do large scale data-analysis, I would suggest Pandas.不过它还处于早期阶段,所以如果你想做大规模的数据分析,我会建议 Pandas。

You can copy/paste the url and change 2 things:您可以复制/粘贴网址并更改两件事:

  1. Remove "blob"删除“斑点”
  2. Replace github.com by raw.githubusercontent.com用 raw.githubusercontent.com 替换 github.com

For instance this link:例如这个链接:

https://github.com/mwaskom/seaborn-data/blob/master/iris.csv

Works this way:以这种方式工作:

import pandas as pd

pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

First convert the github csv file to raw in order to access the data, follow the link below in comment on how to convert csv file to raw .首先将 github csv 文件转换为 raw 以访问数据,请按照下面的链接评论如何将 csv 文件转换为 raw 。

import pandas as pd

url_data = (r'https://raw.githubusercontent.com/oderofrancis/rona/main/Countries-Continents.csv')

data_csv = pd.read_csv(url_data)

data_csv.head()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM