Python：比较相同随机行的两个文本文件

Question

I got a file wordlist.txt with 100 random words in it, each on a separate line. 我有一个文件wordlist.txt，里面有100个随机单词，每个单独一行。 I currently use the following code to grab 12 random words from this file. 我目前使用以下代码从该文件中获取12个随机单词。 To avoid picking exactly the same 12 words i want to builtin an extra check. 为了避免选择完全相同的12个单词，我想构建一个额外的检查。 The 12 random words are written to output.txt. 12个随机字写入output.txt。 How can i make my script compare the 12 random words (in the same order) with the 12 random words i have in output.txt (in 1 line)? 如何让我的脚本将12个随机单词（按相同顺序）与output.txt中的12个随机单词（1行）进行比较？

I currently use the following to read 12 random words from wordlist.txt and write them to output.txt: 我目前使用以下内容从wordlist.txt中读取12个随机单词并将其写入output.txt：

teller = 0

while True:
    teller += 1

    #Choose 12 random words and write to textfile
    print "\nRound",teller
    f1=open('output.txt', 'w+')
    count = 0
    while (count<12):
        f1.write(random.choice([x.rstrip() for x in open('wordlist.txt')])+ " ")
        count += 1
    f1.close()

Answer 1

Instead of random.choice() , read all words into a list and use random.sample() : 而不是random.choice() ，将所有单词读入列表并使用random.sample() ：

with open('wordlist.txt') as wlist:
    words = [w.strip() for w in wlist]
with open('output.txt', 'w') as output:
    for word in random.sample(words, 12):
        output.write(word + '\n')

random.sample() is guaranteed to pick 12 different words from your input list. random.sample()保证从输入列表中选择12个不同的单词。

Because your wordlist is small (just 100 words), reading them all into a list in memory is more than fine. 因为你的单词列表很小（只有100个单词），所以将它们全部读入内存列表中就可以了。

If your input file is larger (megabytes to gigabytes), you may want to move to an algorithm that can pick a uniform sample out of any iterable regardless of size, only requiring memory for the output sample size. 如果您的输入文件较大（兆字节到千兆字节），您可能希望转移到一种算法，该算法可以从任何迭代中选择统一样本而不管大小，只需要输出样本大小的内存。

If you need to find 12 random words that are not yet present in output.txt from a previous run , you need to read those into a set first: 如果您需要查找前一次运行中output.txt尚不存在的12个随机单词，则需要先将它们读入集合中：

with open('wordlist.txt') as wlist:
    words = [w.strip() for w in wlist]

with open('output.txt', 'r') as output:
    seen = {w.strip() for w in output}

with open('output.txt', 'a') as output:
    count = 0
    while count < 12:
        new_word = random.choice(words)
        if new_word in seen:
            words.remove(new_word)
            continue
        seen.add(new_word)
        output.write(new_word + '\n')
        count += 1

Here, I open the output.txt file with 'a' for appending instead, to add the new 12 words we had not yet seen before. 在这里，我打开带有'a'的output.txt文件，用于追加，添加我们之前从未见过的新12个单词。

Python：比较相同随机行的两个文本文件

问题描述

1 个解决方案

解决方案1
2 已采纳 2013-12-04 22:48:36

Python：比较相同随机行的两个文本文件

问题描述

1 个解决方案

解决方案1 2 已采纳 2013-12-04 22:48:36

解决方案1
2 已采纳 2013-12-04 22:48:36