简体   繁体   English

从 Python 中的文件中读取随机行,这些行在其他 4 行通过之前不会重复

[英]Reading random lines from a file in Python that don't repeat untill 4 other lines have passed

So I am trying to make a program that can help people with learning new languages but I am already stuck at the beginning.所以我正在尝试制作一个可以帮助人们学习新语言的程序,但我已经陷入困境。 One of the requirements is to let Python print the lines in a random order.其中一项要求是让 Python 以随机顺序打印行。 So I made this.所以我做了这个。

import random

def randomline(file):
    with open(file) as f:
        lines=f.readlines()
        print(random.choice(lines))

But now I got a problem with one of the other requirements.但是现在我遇到了其他要求之一的问题。 There has to be 4 other words in between before the word can show again and I have no idea how to do that.在单词可以再次显示之前必须有 4 个其他单词,我不知道该怎么做。

I have a very primitive solution for you:我有一个非常原始的解决方案给你:

import random 

def randomline(file):
    with open(file) as f:
        lines=f.readlines()
        return random.choice(lines)

isOccuredInLastFourExistence = True
LastFourWords = []

file = "text_file.txt"
for i in range(0,15):
    new_word = randomline(file)
    print(LastFourWords)
    if new_word in LastFourWords:
        print("I have skipped")
        print(new_word)
        continue
    print(new_word)
    LastFourWords.append(new_word)
    if(len(LastFourWords)) > 4:
        LastFourWords.pop(0)

The file looked like this:该文件如下所示:
在此处输入图像描述


The output looks like:(Showing only partial result)输出看起来像:(仅显示部分结果)

[]
New

['New\n']
Example

['New\n', 'Example\n']
After

['New\n', 'Example\n', 'After\n']
Some

['New\n', 'Example\n', 'After\n', 'Some\n']
I have skipped
Example

['New\n', 'Example\n', 'After\n', 'Some\n']
Please

['Example\n', 'After\n', 'Some\n', 'Please\n']
I have skipped
Please

['Example\n', 'After\n', 'Some\n', 'Please\n']
Only

['After\n', 'Some\n', 'Please\n', 'Only\n']
Word
['Some\n', 'Please\n', 'Only\n', 'Word']
New

So every time you have something which is already present in your list, it will be skipped.因此,每次您的列表中已经存在的内容都会被跳过。 And the list clears the first position element when there are more than 4 elements.并且列表在超过4个元素时清空第一个位置元素。

you can use a queue:你可以使用队列:

# create list with empty elements against which choice is checked
queue = 4*['']

def randomline(file):
    with open(file) as f:
        lines=f.readlines()
        choice = random.choice(lines)
        if not choice in queue:
            print(choice)

            # appendcurrent word to the queue
            queue.append(choice)
            # remove the first element of the list
            queue.pop(0)

You can utilise deque from the collections library.您可以使用collections库中的deque This will allow you to specify a max length for your seen words list.这将允许您为看到的单词列表指定最大长度。 As you append items to the list, if your list is at the max length and you append a new item the oldest item will be remove.当您将项目附加到列表时,如果您的列表达到最大长度并且您附加一个新项目,则最旧的项目将被删除。 This allows you to make a cache.这允许你做一个缓存。 So if you create a list using deque with max length 4. Then you chose a word and check if its in the list, If it is then chose another word, if its not in the list then print the word and add it to the list.因此,如果您使用最大长度为 4 的deque创建一个列表。然后您选择一个单词并检查它是否在列表中,如果是,则选择另一个单词,如果它不在列表中,则打印该单词并将其添加到列表中. you dont have to worry about managing the items in the list as they oldest will automatically drop out when you append something new您不必担心管理列表中的项目,因为当您添加新内容时,最旧的项目会自动退出

from collections import deque
from random import choice, sample

with open('test.dat') as words_file:
    words = words_file.readlines()
    word_cache = deque(maxlen=4)
    for _ in range(30):
        word = choice(words).strip()
        while word in word_cache:
            word = choice(words).strip()
        print(word)
        word_cache.append(word)

I would use linecache .我会使用linecache It's from the standard library and allows you to select a specific line.它来自标准库,允许您选择特定行。 If you know the number of lines in your file, this could work:如果您知道文件中的行数,这可能会起作用:

import linecache
import random

def random_lines(filename, repeat_after=4):

    n_lines = len(open(filename, "r").readlines())
    last_indices = []

    while True:

        index = random.randint(1, n_lines)

        if index not in last_indices:

            last_indices.append(index)
            last_indices = last_indices[-repeat_after:]

            line = linecache.getline(filename, index)
            yield line

This will create a generate which will output a random line from your file without needing to keep your lines in memory (which is great if you start having many lines).这将创建一个生成器,它将从您的文件中输出随机行,而无需将您的行保存在内存中(如果您开始有很多行,这很好)。

As for your requirement of only allowing repetition after n number of times.至于你的要求只允许重复n次。 This will take care of it.这将解决它。 However, this has a very small chance of getting stuck in an infinite loop.但是,这有很小的机会陷入无限循环。

Another approach would be to create a list with all indices (ie line number), shuffle it, and then loop through them.另一种方法是创建一个包含所有索引(即行号)的列表,将其打乱,然后循环遍历它们。 This has the advantage of not risking to be in an infinite loop, but this also means that you'll need to go through every other line before you see the same line again, which may not be ideal for you.这样做的好处是不会陷入无限循环,但这也意味着您需要遍历所有其他行才能再次看到同一行,这对您来说可能并不理想。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM