I want to read all text files in Github repository, but text files addresses are different with raw text address. Trump Speeches
For example look at this link: speech_00.txt in first status
Now, speech_00.txt has different address with in raw mode speech_00.txt in raw status
How could I handle that without editing addresses(for example adding githubusercontent or removing blob)
Also, I read a sample text file using this code:
import urllib
response = urllib.request.urlopen("https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/speech_72.txt")
Text = response.read()
Text=Text.decode("utf-8")
A sorta hacky way to implement this (based on the way that that directory in particular is structured) would be to make a loop iteratively add to the string that you are inputting as your filepath:
import urllib
# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 00
end_speech = 73
while (cur_speech <= end_speech):
# Change the speech you want to get
speech_nm = ('speech_' + str(cur_speech) +'.txt')
response = urllib.request.urlopen(speech_nm)
# Do what you need to with the speech
Text = response.read()
Text = Text.decode("utf-8")
# Update to the new speech
cur_speech +=1
This way, you'll go through each speech in that particular directory.
I use your code(@N.Yasarturk), and I edited it to get all files. But I asked, Are there other methods(without editing addresses) for reading these files from Github repository?
import urllib
# Get master directory
speech_dir ="https://raw.githubusercontent.com/PedramNavid/trump_speeches/master/data/"
# Iterate through all speeches in directory, from 00 to 73
cur_speech = 0
temp=str(cur_speech)
end_speech = 73
while (cur_speech <= end_speech):
# Change the speech you want to get
if(cur_speech<10):
temp="0"+str(cur_speech)
else:
temp=str(cur_speech)
speech_nm = (speech_dir+'speech_' + temp +'.txt')
print(speech_nm)
response = urllib.request.urlopen(speech_nm)
# Do what you need to with the speech
Text = response.read()
Text = Text.decode("utf-8")
print(Text)
# Update to the new speech
cur_speech +=1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.