简体   繁体   English

如何使用 ZA7F5F35426B927411FC9231B563 在 Github 上下载充满 CSV 文件的 Github 存储库?

[英]How Download Github Repo Filled with CSV Files on Github using Python?

I'm trying to do some exploratory data analysis on the data that is provided by CSSE at Johns Hopkins University.我正在尝试对约翰霍普金斯大学 CSSE 提供的数据进行一些探索性数据分析。 They have it on Github at this link https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports I'm trying to download the entire file using python that will save it to my current directory. They have it on Github at this link https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports I'm trying to download the entire file using python that will save it to my current directory. That way I'll have all the up to date, data and can reload it to use.这样,我将拥有所有最新的数据,并且可以重新加载它以供使用。 I'm using two functions fetch_covid_daily_data() that will go to the website and download all the CSV files.我正在使用两个函数fetch_covid_daily_data()将 go 到网站并下载所有 CSV 文件。 Then ill have a load_covid_daily_data() that will go in the current repo and read the data so I can process it with pandas.然后我有一个load_covid_daily_data() ,它将 go 在当前存储库中并读取数据,以便我可以使用 pandas 处理它。

I'm doing this way because if I go back to my code I can call the function fetch_covid_daily_data() and it will download all the new changes made such as another daily CSV added.我这样做是因为如果我 go 回到我的代码,我可以调用 function fetch_covid_daily_data()并且它将下载所做的所有新更改,例如添加的另一个每日 ZCC8D68C551C4A9ADFEZDED5134。

You can read data directly from online CSV to Pandas DataFrame:可以直接从在线CSV读取数据到Pandas DataFrame:

Examples:例子:

import pandas as pd

CONFIRMED_URL = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'

df = pd.read_csv(CONFIRMED_URL)

# df now contains data from time of call.

You can also create a class to get and manipulate all data您还可以创建 class 来获取和操作所有数据


import pandas as pd

class Corona:


    def __init__(self):

        BASE_URL = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series'

        self.URLS = {'confirmed': f'{BASE_URL}/time_series_covid19_confirmed_global.csv',
                'deaths': f'{BASE_URL}/time_series_covid19_deaths_global.csv',
                'recovered':f'{BASE_URL}/time_series_covid19_recovered_global.csv', 
        }


        self.data = {case:pd.read_csv(url) for case, url in self.URLS.items()}

    # create other useful functions to work with data
    def current_status(self):
        # function to show current status
        pass 


To get current data:获取当前数据:

# returns data as dictionary with DataFrames as Values
corona = Corona()
confirmed_df = corona.data['confirmed']

# If you want to save them to csv
confirmed_df.to_csv('confirmed.csv', index=False)

# show first five rows
print(corona_df.head())

# check other DataFrame
print(corona.data.keys())

Assuming you have git installed, you need to clone the repository from your terminal假设您安装了 git,您需要从终端克隆存储库

git clone https://github.com/CSSEGISandData/COVID-19

hope this helps!希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM