How to make this task much faster to be finished? The 3 calls of generate_ngrams_from_file() can be done in parallel? Just get into python and don't know how to make it faster. I think multiprocessing or threading should be doing the job, but no idea of how to do it. This looks like a typical task can be done concurrently to use multiple cores on my Mac machine.
def tokenize(text):
return [token for token in text.split(' ')]
def generate_ngrams(text, n):
tokens = tokenize(text)
ngrams = zip(*[tokens[i:] for i in range(n)])
return [''.join(ngram) for ngram in ngrams]
def generate_ngrams_from_file(input, out, n):
count = 0
with open(input, 'r') as f:
for line in f:
count += 1
if line:
ngrams = generate_ngrams(line, n)
if n == 2:
bigrams.update(ngrams)
elif n == 3:
trigrams.update(ngrams)
elif n == 4:
fourgrams.update(ngrams)
elif n == 5:
fourgrams.update(ngrams)
print("Ngram done!")
if __name__ == "__main__":
start = time.time()
input_file = 'bigfile.txt'
output_3_tram = '3gram.txt'
output_4_tram = '4ngram.txt'
output_5_tram = '5ngram.txt'
print('Generate trigram: ')
generate_ngrams_from_file(input_file, output_3_tram, 3)
print("Generate fourgrams: ")
generate_ngrams_from_file(input_file, output_4_tram, 4)
print("Generate fivegrams: ")
generate_ngrams_from_file(input_file, output_5_tram, 5)
end = time.time()
mytime(start, end)
Multithreading in Python is not a very good idea because of the Global Interpreter Lock feature of Python. You can read about it here https://www.geeksforgeeks.org/what-is-the-python-global-interpreter-lock-gil/ . Multiprocessing is a better option to make your programs faster. You can put the generate_ngrams()
function inside the Process
class of multiprocessing
module. Read about the Process
class at https://docs.python.org/2/library/multiprocessing.html . Process
class is recommended as it is faster than both pool.apply()
and pool.apply_async()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.