简体   繁体   中英

how to create a list of word pairs from a list

i have a list of words in the file "temp":

 1. the 
 2. of
 3. to
 4. and
 5. bank

and so on

how do i improve its readability?

import itertools
f = open("temp.txt","r")
lines = f.readlines()
pairs = list(itertools.permutations(lines, 2))
print(pairs)

I am lost, please help.

import itertools

with open("temp.txt", "r") as f:
    words = [item.split(' ')[-1].strip() for item in f]

pairs = list(itertools.permutations(words, 2))
print(pairs)

Prints (using pprint for readability):

[('the', 'of'),
 ('the', 'to'),
 ('the', 'and'),
 ('the', 'bank'),
 ('of', 'the'),
 ('of', 'to'),
 ('of', 'and'),
 ('of', 'bank'),
 ('to', 'the'),
 ('to', 'of'),
 ('to', 'and'),
 ('to', 'bank'),
 ('and', 'the'),
 ('and', 'of'),
 ('and', 'to'),
 ('and', 'bank'),
 ('bank', 'the'),
 ('bank', 'of'),
 ('bank', 'to'),
 ('bank', 'and')]

I am assuming that your problem is creating all the possible pair of words defined in the temp file. This is called permutation and you are already using the itertools.permutations function

If you need to actually write the output to a file your code should be the following:

The code:

import itertools
f = open("temp","r")
lines = [line.split(' ')[-1].strip() for line in f] #1
pairs = list(itertools.permutations(lines, 2)) #2
r = open('result', 'w') #3
r.write("\n".join([" ".join(p) for p in pairs])) #4
r.close() #5
  1. The [line.split(' ')[-1].strip() for line in f] will read the whole file and for each readed line, it will split it around the space character, choose the last item of the line (negative indexes like -1 walks backwards in the list), remove any trailing whitespace (like \\n ) and put all the lines in one list
  2. pairs are generated like you already did, but now they dont have the trailling \\n
  3. open the result file for writing
  4. join the pairs separated by a space ( " " ), join each result (a line) with a \\n and then write to the file
  5. close the file (thus flushing it)

Some improvements with explanations

import itertools

with open('temp.txt', 'r') as fobj_in, open('out.txt', 'w') as fobj_out:
    words = (item.split()[-1] for item in fobj_in if item.strip())
    for pair in itertools.permutations(words, 2):
        fobj_out.write('{} {}\n'.format(*pair))

Explanation

with open('temp.txt', 'r') as fobj_in, open('out.txt', 'w') as fobj_out:

We open both files, one for reading, one of writing with the help of with . This guarantees that both files will be closed as soon as we leave the indentation of the with block even if there is an exception somewhere in this block.

We use a list comprehension to get all the words:

words = [item.split()[-1] for item in fobj_in if item.strip()]

item.split()[-1] strips at any whitespace and gives us the last entry in the line. Note that it also takes off the \\n at the end of each line. No need for a .strip() here. item.split() is often better than item.split(' ') because it would also work for more than one space and for tabs. We still need to make sure that the line is not empty with if item.strip() . If nothing is left after removing all whitespace there are no words for us and item.split()[-1] would give and index error. Just go to the next line and discard this one.

Now we can iterate over all pairs and write them into the output file:

for pair in itertools.permutations(words, 2):
    fobj_out.write('{} {}\n'.format(*pair))

We ask the iterator to give us the next word pair one pair at a time and write this pair to the output file. There is no need to convert it to a list. The .format(*pair) unpacks the two elements in pair and is equivalent to .format(pair[0], pair[1]) for our pair with two elements.

Performance note

The first intuition maybe to use a generator expression to read the words from the file too:

words = (item.split()[-1] for item in fobj_in if item.strip())

But time measurements show that the list comprehension is faster than the generator expression. This is due to itertools.permutations(words) consuming the iterator words anyway. Creating a list in the first place avoids this doubled effort of going through all elements again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM