简体   繁体   English

如何读取Github存储库中的所有文本文件?

[英]How to read all text files in Github repository?

I want to read all text files in Github repository, but text files addresses are different with raw text address. 我想读取Github存储库中的所有文本文件,但是文本文件的地址与原始文本地址不同。 Trump Speeches 特朗普演讲

For example look at this link: speech_00.txt in first status 例如,查看此链接: 处于第一状态的speech_00.txt

Now, speech_00.txt has different address with in raw mode speech_00.txt in raw status 现在,speech_00.txt的原始地址与原始状态下的语音地址不同(原始状态下)

How could I handle that without editing addresses(for example adding githubusercontent or removing blob) 我该如何处理而不编辑地址(例如添加githubusercontent或删除blob)

Also, I read a sample text file using this code: 另外,我使用以下代码读取了示例文本文件:

import urllib
response = urllib.request.urlopen("https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/speech_72.txt")
Text = response.read()
Text=Text.decode("utf-8") 

A sorta hacky way to implement this (based on the way that that directory in particular is structured) would be to make a loop iteratively add to the string that you are inputting as your filepath: 实现此目的的一种简单方法(基于特定目录的结构方式)将是使循环迭代地添加到作为文件路径输入的字符串中:

import urllib

# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 00
end_speech = 73
while (cur_speech <= end_speech):
    # Change the speech you want to get
    speech_nm = ('speech_' + str(cur_speech) +'.txt')
    response = urllib.request.urlopen(speech_nm)
    # Do what you need to with the speech
    Text = response.read()
    Text = Text.decode("utf-8")
    # Update to the new speech
    cur_speech +=1

This way, you'll go through each speech in that particular directory. 这样,您将遍历该特定目录中的每个语音。

I use your code(@N.Yasarturk), and I edited it to get all files. 我使用您的代码(@ N.Yasarturk),并对其进行了编辑以获取所有文件。 But I asked, Are there other methods(without editing addresses) for reading these files from Github repository? 但是我问,还有其他方法(没有编辑地址)可以从Github存储库中读取这些文件吗?

import urllib
# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 0
temp=str(cur_speech)
end_speech = 73
while (cur_speech <= end_speech):
    # Change the speech you want to get
    if(cur_speech<10):
        temp="0"+str(cur_speech)
    else:
        temp=str(cur_speech)
    speech_nm = (speech_dir+'speech_' + temp +'.txt')
    print(speech_nm)
    response = urllib.request.urlopen(speech_nm)
    # Do what you need to with the speech
    Text = response.read()
    Text = Text.decode("utf-8")
    print(Text)
    # Update to the new speech
    cur_speech +=1    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 python 文件添加到 github 中的新存储库 - How to add python files to a new repository in github GitHub从raw.githubusercontent获取存储库中所有文件的列表? - GitHub get a list of all files in a repository from raw.githubusercontent? 如何读取名称存储在文本文件中的所有文件 - How to read all files that name store in text file Github API 如何使用 python 语言获取具有特定扩展名(.c、.cpp、.py 等)的 git 存储库的所有文件的计数? - Github API How to get a count of all the files of a git repository with a particular extension (.c, .cpp, .py etc) using python language? 如何使用 Pandas 从私有 GitHub 存储库中读取 excel 数据框? - How to read an excel dataframe from a private GitHub repository using pandas? 如何读取多个文本文件,我们只读取同一组的所有文本文件? - How to read multiple texts files, where we read all text files only of same group? 如何在不登录的情况下从 Github 存储库下载文件? - How can I download files from a Github repository without logging in? 列出Github仓库的所有合作者 - Listing all collaborators of Github repository 如何列出我的存储库中的所有文件夹和文件 - How to list all folder and files in my repository 如何从一个目录中读取多个文本文件,将它们全部转换为excel文件 - How to read multiple text files from a directory, convert them all to excel files
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM