简体   繁体   English

如何通过使用唯一的单词和顺序来重新创建句子

[英]How to recreate a sentence by using unique words and order

I have been looking for how to write a program which will recreate a sentence with its order and individual words. 我一直在寻找如何编写一个程序来重新创建具有顺序和单个单词的句子。

The individual words are stored in a file in this format: 各个单词以以下格式存储在文件中:

i am what so deal with it

and the order is also in a separate file like so: 订单也位于单独的文件中,如下所示:

1 2 3 1 2 4 5 6 7

finally it should make the sentence: 最后它应该是这样的:

>>>i am what i am so deal with it

sorry but I am unable to put a code that tried because i am unable to find how to do this. 抱歉,但是我无法输入尝试的代码,因为我无法找到方法。

Here is how I would do it. 这就是我要怎么做。

In [4]: order = "1 2 3 1 2 4 5 6 7"

In [5]: words = "i am what so deal with it"

In [6]: word_list = words.split()

In [7]: word_list
Out[7]: ['i', 'am', 'what', 'so', 'deal', 'with', 'it']

In [8]: order = map(lambda x: int(x)-1, order.split())

In [9]: order
Out[9]: [0, 1, 2, 0, 1, 3, 4, 5, 6]

In [10]: " ".join([word_list[i] for i in order])
Out[10]: 'i am what i am so deal with it'

I subtract 1 in In [8]: order = map(lambda x: int(x)-1, order.split()) because lists in python are indexed from 0. The code above uses builtins ( int , join , map and split ) so refer to Python documentation to understand what exactly they do. In [8]: order = map(lambda x: int(x)-1, order.split())减去1 In [8]: order = map(lambda x: int(x)-1, order.split())因为python中的列表从0开始索引。上面的代码使用内建函数( intjoinmapsplit ),因此请参阅Python文档以了解其确切功能。

An often useful mitigation of mismatch in indexing is to prepend a 0 indexed dummy entry in the word base (in python code), so that the 1 based indexing from the select indices file used to build the sentence is directly used to construct the sentence sequence upon read of the indices file. 减轻索引不匹配的一种通常有用的缓解方法是在单词库中添加一个0索引的伪条目(在python代码中),以便直接将用于构建句子的选择索引文件中基于1的索引直接用于构建句子序列在读取索引文件时。

Thus I suggest to try: 因此,我建议尝试:

#! /usr/bin/env python
from __future__ import print_function

word_base = None
with open('so_word_base.txt', 'rt') as f_base:
    word_base = [None] + [z.strip() for z in f_base.read().split()]

sentence_seq = None
with open('so_select_indices.txt', 'rt') as f_select:
    sentence_seq = [word_base[int(i)] for i in f_select.read().split()]

print(' '.join(sentence_seq))

with the file for the word "atoms" (so_word_base.txt): 带有单词“ atoms”的文件(so_word_base.txt):

i am what so deal with it

and the file for selecting the indices into that word "base" (so_select_indices.txt): 以及用于选择该单词“ base”中的索引的文件(so_select_indices.txt):

1 2 3 1 2 4 5 6 7

This yields: 这样产生:

i am what i am so deal with it

Note this is fragile like the other solutions suggested, which should be ok, so the OP learns how to implement such a minimal database like application ;-) 请注意,这与建议的其他解决方案一样脆弱,应该可以,因此OP学习如何实现像应用程序这样的最小数据库;-)

One might try to test for the None of the variables where it is explicitly set in more robust code, try to catch exceptions when files are not there or do not allow read or do not succeed in being parsed. 您可能会尝试测试在更健壮的代码中显式设置的变量中的None ,尝试在文件不存在或不允许读取或无法成功解析时捕获异常。

Bit more "beginner friendly" solution: 更多“初学者友好”解决方案:

words = open("words.txt",'r').read().split()
order = open("order.txt",'r').read().split()
result = ""
for i in order:
    result+= words[int(i)-1] + " "
print(result)

and the result is the same: 结果是一样的:

i am what i am so deal with it 

You can use enumerate to get the current position of the words, then create a dictionary. 您可以使用枚举获取单词的当前位置,然后创建字典。 To read the files you can use something like this: 要读取文件,您可以使用以下方法:

with open('file1.txt', 'r') as f:
    string = f.read()

with open('file2.txt', 'r') as f:
    order = [int(i) for i in f.read().split()]

Then re-order the words: 然后重新排列单词:

string = "i am what so deal with it"
order = [1, 2, 3, 1, 2, 4, 5, 6, 7]

string = string.split()

indexDict = {i:j for i,j in enumerate(string)}

newString = ' '.join([indexDict[i-1] for i in order])

Output: 输出:

>>> newString
'i am what i am so deal with it'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM