如何读取Github存储库中的所有文本文件？

Question

我想读取Github存储库中的所有文本文件，但是文本文件的地址与原始文本地址不同。 特朗普演讲

现在，speech_00.txt的原始地址与原始状态下的语音地址不同（原始状态下）

我该如何处理而不编辑地址（例如添加githubusercontent或删除blob）

另外，我使用以下代码读取了示例文本文件：

import urllib
response = urllib.request.urlopen("https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/speech_72.txt")
Text = response.read()
Text=Text.decode("utf-8")

Answer 1

实现此目的的一种简单方法（基于特定目录的结构方式）将是使循环迭代地添加到作为文件路径输入的字符串中：

import urllib

# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 00
end_speech = 73
while (cur_speech <= end_speech):
    # Change the speech you want to get
    speech_nm = ('speech_' + str(cur_speech) +'.txt')
    response = urllib.request.urlopen(speech_nm)
    # Do what you need to with the speech
    Text = response.read()
    Text = Text.decode("utf-8")
    # Update to the new speech
    cur_speech +=1

这样，您将遍历该特定目录中的每个语音。

Answer 2

我使用您的代码（@ N.Yasarturk），并对其进行了编辑以获取所有文件。 但是我问，还有其他方法（没有编辑地址）可以从Github存储库中读取这些文件吗？

import urllib
# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 0
temp=str(cur_speech)
end_speech = 73
while (cur_speech <= end_speech):
    # Change the speech you want to get
    if(cur_speech<10):
        temp="0"+str(cur_speech)
    else:
        temp=str(cur_speech)
    speech_nm = (speech_dir+'speech_' + temp +'.txt')
    print(speech_nm)
    response = urllib.request.urlopen(speech_nm)
    # Do what you need to with the speech
    Text = response.read()
    Text = Text.decode("utf-8")
    print(Text)
    # Update to the new speech
    cur_speech +=1

如何读取Github存储库中的所有文本文件？

问题描述

2 个解决方案

解决方案1
1 2019-05-17 20:43:57

解决方案2
0 已采纳 2019-05-18 09:20:22

如何读取Github存储库中的所有文本文件？

问题描述

2 个解决方案

解决方案1 1 2019-05-17 20:43:57

解决方案2 0 已采纳 2019-05-18 09:20:22

解决方案1
1 2019-05-17 20:43:57

解决方案2
0 已采纳 2019-05-18 09:20:22