简体   繁体   English

如何从列表中创建单词对列表

[英]how to create a list of word pairs from a list

i have a list of words in the file "temp": 我在文件“ temp”中有一个单词列表:

 1. the 
 2. of
 3. to
 4. and
 5. bank

and so on 等等

how do i improve its readability? 我如何提高其可读性?

import itertools
f = open("temp.txt","r")
lines = f.readlines()
pairs = list(itertools.permutations(lines, 2))
print(pairs)

I am lost, please help. 我迷路了,请帮忙。

import itertools

with open("temp.txt", "r") as f:
    words = [item.split(' ')[-1].strip() for item in f]

pairs = list(itertools.permutations(words, 2))
print(pairs)

Prints (using pprint for readability): 打印(使用pprint的可读性):

[('the', 'of'),
 ('the', 'to'),
 ('the', 'and'),
 ('the', 'bank'),
 ('of', 'the'),
 ('of', 'to'),
 ('of', 'and'),
 ('of', 'bank'),
 ('to', 'the'),
 ('to', 'of'),
 ('to', 'and'),
 ('to', 'bank'),
 ('and', 'the'),
 ('and', 'of'),
 ('and', 'to'),
 ('and', 'bank'),
 ('bank', 'the'),
 ('bank', 'of'),
 ('bank', 'to'),
 ('bank', 'and')]

I am assuming that your problem is creating all the possible pair of words defined in the temp file. 我假设您的问题是创建temp文件中定义的所有可能的单词对。 This is called permutation and you are already using the itertools.permutations function 这称为置换 ,您已经在使用itertools.permutations函数

If you need to actually write the output to a file your code should be the following: 如果需要将输出实际写入文件,则代码应为以下内容:

The code: 编码:

import itertools
f = open("temp","r")
lines = [line.split(' ')[-1].strip() for line in f] #1
pairs = list(itertools.permutations(lines, 2)) #2
r = open('result', 'w') #3
r.write("\n".join([" ".join(p) for p in pairs])) #4
r.close() #5
  1. The [line.split(' ')[-1].strip() for line in f] will read the whole file and for each readed line, it will split it around the space character, choose the last item of the line (negative indexes like -1 walks backwards in the list), remove any trailing whitespace (like \\n ) and put all the lines in one list [line.split(' ')[-1].strip() for line in f][line.split(' ')[-1].strip() for line in f]将读取整个文件,并且对于读取的每一行,它将在空格字符周围分割它,选择该行的最后一项(负索引(如-1 )在列表中向后移动),删除所有尾随空格(如\\n ),并将所有行放在一个列表中
  2. pairs are generated like you already did, but now they dont have the trailling \\n 对已像您已经生成的那样生成,但是现在它们没有尾随的\\n
  3. open the result file for writing 打开result文件进行写入
  4. join the pairs separated by a space ( " " ), join each result (a line) with a \\n and then write to the file 将两对以空格( " " )分隔的行对,将每个结果(一行)与\\n ,然后写入文件
  5. close the file (thus flushing it) 关闭文件(因此刷新它)

Some improvements with explanations 一些改进的解释

import itertools

with open('temp.txt', 'r') as fobj_in, open('out.txt', 'w') as fobj_out:
    words = (item.split()[-1] for item in fobj_in if item.strip())
    for pair in itertools.permutations(words, 2):
        fobj_out.write('{} {}\n'.format(*pair))

Explanation 说明

with open('temp.txt', 'r') as fobj_in, open('out.txt', 'w') as fobj_out:

We open both files, one for reading, one of writing with the help of with . 我们打开这两个文件,一个用于读取,的帮助下写的一个with This guarantees that both files will be closed as soon as we leave the indentation of the with block even if there is an exception somewhere in this block. 这样可以保证,即使在该块中某处有异常,只要我们离开with块的缩进,两个文件都将被关闭。

We use a list comprehension to get all the words: 我们使用列表理解来获取所有单词:

words = [item.split()[-1] for item in fobj_in if item.strip()]

item.split()[-1] strips at any whitespace and gives us the last entry in the line. item.split()[-1]在任何空格处剥离,并为我们提供该行的最后一个条目。 Note that it also takes off the \\n at the end of each line. 请注意,它还在每行末尾取\\n No need for a .strip() here. 这里不需要.strip() item.split() is often better than item.split(' ') because it would also work for more than one space and for tabs. item.split()通常比item.split(' ')更好,因为它也可以用于多个空间和制表符。 We still need to make sure that the line is not empty with if item.strip() . 我们仍然需要使用if item.strip()确保该行不为空。 If nothing is left after removing all whitespace there are no words for us and item.split()[-1] would give and index error. 如果删除所有空格后什么都没留下,那么我们就没有字了, item.split()[-1]将给出索引错误。 Just go to the next line and discard this one. 只需转到下一行并丢弃该行即可。

Now we can iterate over all pairs and write them into the output file: 现在,我们可以遍历所有对,并将它们写入输出文件:

for pair in itertools.permutations(words, 2):
    fobj_out.write('{} {}\n'.format(*pair))

We ask the iterator to give us the next word pair one pair at a time and write this pair to the output file. 我们要求迭代器一次给我们下一个单词对一对,然后将此对写入输出文件。 There is no need to convert it to a list. 无需将其转换为列表。 The .format(*pair) unpacks the two elements in pair and is equivalent to .format(pair[0], pair[1]) for our pair with two elements. .format(*pair)的两个元素,并与我们具有两个元素的pair等效于.format(pair[0], pair[1])

Performance note 业绩说明

The first intuition maybe to use a generator expression to read the words from the file too: 第一种直觉可能是也使用生成器表达式从文件中读取单词:

words = (item.split()[-1] for item in fobj_in if item.strip())

But time measurements show that the list comprehension is faster than the generator expression. 但是时间测量表明,列表理解比生成器表达式要快。 This is due to itertools.permutations(words) consuming the iterator words anyway. 这是由于itertools.permutations(words)始终消耗迭代器words Creating a list in the first place avoids this doubled effort of going through all elements again. 首先创建列表可以避免再次遍历所有元素的工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM