[英]How to compress text file
無論如何,有沒有壓縮此代碼中使用的文本。 我會很感激的。
嘿,總有沒有要壓縮此代碼中使用的文本。 我會很感激的。
file = open("Test.txt", "r")
Sentence = (file.read())
s = Sentence.split(" ")
ListSentence = []
uniquewords = []
print(Sentence)
for x in s:
if x in uniquewords:
ListSentence.append(uniquewords.index(x))
else:
uniquewords.append(x)
ListSentence.append(uniquewords.index(x))
print(ListSentence)
recreated = ""
for position in ListSentence:
recreated = recreated + uniquewords[position] + " "
print(uniquewords)
print (recreated)
問題有點含糊...如果您指的是數據壓縮,則可以使用二進制轉換。
In [1]: import codecs
In [2]: example = 'abcdefg'*100
In [3]: compressed = codecs.encode(example.encode(), 'zlib')
In [4]: compressed
Out[4]: b'x\x9cKLJNIMKO\x1c\xa5F\xa9\xa1F\x01\x00m\x8e\x11\x80'
In [5]: decompressed = codecs.decode(compressed, 'zlib')
In [6]: decompressed
Out[6]: b'abcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefg'
查看文檔中的編解碼器,底部是為二進制轉換提供的內置編解碼器。
但是,如果您要壓縮以表達減少代碼行的願望,那么雖然您的代碼意圖含糊不清,但我想您想過濾掉重復的單詞,同時可能保留單詞的順序...
沒有命令:
' '.join(set(sentence.split()))
有訂單:
seen = set()
words = sentence.split()
new = []
for word in words:
if word not in seen:
seen.add(word)
new.append(word)
unique_ordered = ' '.join(new)
似乎您在詢問是否可以減少所擁有的代碼行。 這是我的嘗試:
with open("Test.txt", "r") as file:
Sentence = file.read().split(" ")
ListSentence, uniquewords = [], []
print(Sentence)
for x in s:
if x not in uniquewords:
uniquewords.append(x)
ListSentence.append(uniquewords.index(x)) # you do this every loop anyway
print(ListSentence)
recreated = ""
for position in ListSentence:
recreated += uniquewords[position] + " "
print(uniquewords)
print(recreated)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.