[英]how to create a list of word pairs from a list
i have a list of words in the file "temp": 我在文件“ temp”中有一个单词列表:
1. the
2. of
3. to
4. and
5. bank
and so on 等等
how do i improve its readability? 我如何提高其可读性?
import itertools
f = open("temp.txt","r")
lines = f.readlines()
pairs = list(itertools.permutations(lines, 2))
print(pairs)
I am lost, please help. 我迷路了,请帮忙。
import itertools
with open("temp.txt", "r") as f:
words = [item.split(' ')[-1].strip() for item in f]
pairs = list(itertools.permutations(words, 2))
print(pairs)
Prints (using pprint
for readability): 打印(使用
pprint
的可读性):
[('the', 'of'),
('the', 'to'),
('the', 'and'),
('the', 'bank'),
('of', 'the'),
('of', 'to'),
('of', 'and'),
('of', 'bank'),
('to', 'the'),
('to', 'of'),
('to', 'and'),
('to', 'bank'),
('and', 'the'),
('and', 'of'),
('and', 'to'),
('and', 'bank'),
('bank', 'the'),
('bank', 'of'),
('bank', 'to'),
('bank', 'and')]
I am assuming that your problem is creating all the possible pair of words defined in the temp
file. 我假设您的问题是创建
temp
文件中定义的所有可能的单词对。 This is called permutation and you are already using the itertools.permutations
function 这称为置换 ,您已经在使用
itertools.permutations
函数
If you need to actually write the output to a file your code should be the following: 如果需要将输出实际写入文件,则代码应为以下内容:
The code: 编码:
import itertools
f = open("temp","r")
lines = [line.split(' ')[-1].strip() for line in f] #1
pairs = list(itertools.permutations(lines, 2)) #2
r = open('result', 'w') #3
r.write("\n".join([" ".join(p) for p in pairs])) #4
r.close() #5
[line.split(' ')[-1].strip() for line in f]
will read the whole file and for each readed line, it will split it around the space character, choose the last item of the line (negative indexes like -1
walks backwards in the list), remove any trailing whitespace (like \\n
) and put all the lines in one list [line.split(' ')[-1].strip() for line in f]
的[line.split(' ')[-1].strip() for line in f]
将读取整个文件,并且对于读取的每一行,它将在空格字符周围分割它,选择该行的最后一项(负索引(如-1
)在列表中向后移动),删除所有尾随空格(如\\n
),并将所有行放在一个列表中 \\n
\\n
result
file for writing result
文件进行写入 " "
), join each result (a line) with a \\n
and then write to the file " "
)分隔的行对,将每个结果(一行)与\\n
,然后写入文件 import itertools
with open('temp.txt', 'r') as fobj_in, open('out.txt', 'w') as fobj_out:
words = (item.split()[-1] for item in fobj_in if item.strip())
for pair in itertools.permutations(words, 2):
fobj_out.write('{} {}\n'.format(*pair))
with open('temp.txt', 'r') as fobj_in, open('out.txt', 'w') as fobj_out:
We open both files, one for reading, one of writing with the help of with
. 我们打开这两个文件,一个用于读取,的帮助下写的一个
with
。 This guarantees that both files will be closed as soon as we leave the indentation of the with
block even if there is an exception somewhere in this block. 这样可以保证,即使在该块中某处有异常,只要我们离开
with
块的缩进,两个文件都将被关闭。
We use a list comprehension to get all the words: 我们使用列表理解来获取所有单词:
words = [item.split()[-1] for item in fobj_in if item.strip()]
item.split()[-1]
strips at any whitespace and gives us the last entry in the line. item.split()[-1]
在任何空格处剥离,并为我们提供该行的最后一个条目。 Note that it also takes off the \\n
at the end of each line. 请注意,它还在每行末尾取
\\n
。 No need for a .strip()
here. 这里不需要
.strip()
。 item.split()
is often better than item.split(' ')
because it would also work for more than one space and for tabs. item.split()
通常比item.split(' ')
更好,因为它也可以用于多个空间和制表符。 We still need to make sure that the line is not empty with if item.strip()
. 我们仍然需要使用
if item.strip()
确保该行不为空。 If nothing is left after removing all whitespace there are no words for us and item.split()[-1]
would give and index error. 如果删除所有空格后什么都没留下,那么我们就没有字了,
item.split()[-1]
将给出索引错误。 Just go to the next line and discard this one. 只需转到下一行并丢弃该行即可。
Now we can iterate over all pairs and write them into the output file: 现在,我们可以遍历所有对,并将它们写入输出文件:
for pair in itertools.permutations(words, 2):
fobj_out.write('{} {}\n'.format(*pair))
We ask the iterator to give us the next word pair one pair at a time and write this pair to the output file. 我们要求迭代器一次给我们下一个单词对一对,然后将此对写入输出文件。 There is no need to convert it to a list.
无需将其转换为列表。 The
.format(*pair)
unpacks the two elements in pair
and is equivalent to .format(pair[0], pair[1])
for our pair with two elements. .format(*pair)
的两个元素,并与我们具有两个元素的pair
等效于.format(pair[0], pair[1])
。
The first intuition maybe to use a generator expression to read the words from the file too: 第一种直觉可能是也使用生成器表达式从文件中读取单词:
words = (item.split()[-1] for item in fobj_in if item.strip())
But time measurements show that the list comprehension is faster than the generator expression. 但是时间测量表明,列表理解比生成器表达式要快。 This is due to
itertools.permutations(words)
consuming the iterator words
anyway. 这是由于
itertools.permutations(words)
始终消耗迭代器words
。 Creating a list in the first place avoids this doubled effort of going through all elements again. 首先创建列表可以避免再次遍历所有元素的工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.