我该如何解决？

Question

how do i get this program to compress a file into a list of words and list of positions to recreate the original file. 我如何获得该程序将文件压缩为单词列表和位置列表，以重新创建原始文件。 Then to take the compressed file and recreate the full text, including punctuation and capitalisation, of the original file. 然后，获取压缩文件并重新创建原始文件的全文，包括标点和大写。

startsentence = input("Please enter a sentence: ")
sentence = (startsentence)
a = startsentence.split(" ")
dict = dict()
number = 1
positions = []
for j in a:
    if j not in dict:
        dict[j] = str(number)
        number = number + 1
    positions.append(dict[j])
print (positions)


print(positions)
f = open("postions.txt", "w") 
f.write( str(positions) + "\n"  )
f.close()

print(sentence)
f = open("words.txt", "w") 
f.write( str(startsentence) + "\n"  ) 
f.close()

Answer 1

Currently you are writing out the whole startsentence and not just the unique words: 当前，您正在写出整个startsentence ，而不仅仅是唯一的单词：

f = open("words.txt", "w") 
f.write( str(startsentence) + "\n"  ) 
f.close()

You need to write only the unique words and their index and you've already created a dictionary with those words and their index dict (BTW you really shouldn't use dict as a variable name, so I will use dct ). 您只需要编写唯一的单词及其索引，就已经用这些单词及其索引dict创建了一个字典（顺便说一句，您确实不应该将dict用作变量名，所以我将使用dct ）。 You just need to write them out sorted based on their value (using a with statement): 您只需要根据它们的值（使用with语句）将它们写出：

with open("words.txt", "w") as f:
    f.write(' '.join(sorted(dct, key=dct.get)) + '\n')

Assuming you have a list of positions (BTW: it is much easier to start from 0 than 1) and a list of words then restoration is simple: 假设您有一个职位列表（顺便说一句：从0开始比1容易得多），并且有一个单词列表，那么恢复很简单：

with open('positions.txt') as pf, open('words.txt' as wf:
    positions = [int(p) for p in pf.read().split()]  
    words = wf.read().strip().split()

recovered = ' '.join(words[p] for p in positions) # p-1 if you start from 1

我该如何解决？

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-11-23 15:36:03

我该如何解决？

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-11-23 15:36:03

解决方案1
0 已采纳 2016-11-23 15:36:03