简体   繁体   English

我的代码缺少一些我试图退出文件的行

[英]My code is missing some of the lines im trying to get out of a file

在此处输入图像描述 The basic task is to write a function, get_words_from_file(filename), that returns a list of lower case words that are within the region of interest.基本任务是编写一个函数 get_words_from_file(filename),它返回感兴趣区域内的小写单词列表。 They share with you a regular expression: "[az]+[-'][az]+|[az]+[']?|[az]+", that finds all words that meet this definition.他们与您共享一个正则表达式:“[az]+[-'][az]+|[az]+[']?|[az]+”,它查找所有符合此定义的单词。 My code works well on some of the tests but fails when the line that indicates the region of interest is repeated.我的代码在某些测试中运行良好,但在重复指示感兴趣区域的行时失败。 Here's is my code:这是我的代码:

import re

def get_words_from_file(filename):
    """Returns a list of lower case words that are with the region of 
    interest, every word in the text file, but, not any of the punctuation."""
    with open(filename,'r', encoding='utf-8') as file:
        flag = False
        words = []
        count = 0
        for line in file:
            if line.startswith("*** START OF"):
                while count < 1:
                    flag=True
                    count += 1
            elif line.startswith("*** END"):
                flag=False
                break       
            elif(flag):
                new_line = line.lower()
                words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", 
                                           new_line)
                words.extend(words_on_line)
    
        return words

#test code:
filename = "bee.txt"
words = get_words_from_file(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(words)))
print("Valid word list:")
for word in words:
    print(word)

The issue is the string "*** START OF" is repeated and isn't included when it is inside the region of interest.问题是字符串“*** START OF”重复并且在感兴趣区域内时不包括在内。

The test code should result in:
bee.txt loaded ok.↩
16 valid words found.↩
Valid word list:↩
yes↩
really↩
this↩
time↩
start↩
of↩
synthetic↩
test↩
case↩
end↩
synthetic↩
test↩
case↩
i'm↩
in↩
too

But I'm getting:但我得到:

bee.txt loaded ok.↩
11 valid words found.↩
Valid word list:↩
yes↩
really↩
this↩
time↩
end↩
synthetic↩
test↩
case↩
i'm↩
in↩
too

Any help would be great!任何帮助都会很棒! Attached is a screenshot of the file附件是文件截图

The specific problem of your code is the if .. elif .. elif statement, you're ignoring all lines that look like the line that signals the start or end of a block, even if it's in the test block.您的代码的具体问题是if .. elif .. elif语句,您忽略了所有看起来像表示块开始或结束的行的行,即使它在测试块中也是如此。

You wanted something like this for your function:您想要这样的功能:

def get_words_from_file(filename):
    """Returns a list of lower case words that are with the region of
    interest, every word in the text file, but, not any of the punctuation."""
    with open(filename, 'r', encoding='utf-8') as file:
        in_block = False
        words = []
        for line in file:
            if not in_block and line == "*** START OF A SYNTHETIC TEST CASE ***\n":
                in_block = True
            elif in_block and line == "*** END TEST CASE ***\n":
                break
            elif in_block:
                words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", line.lower())
                words.extend(words_on_line)

        return words

This is assuming you are actually looking for the whole line as a marker, but of course you can still use .startswith() if you actually accept that as the start or end of the block, as long as it's sufficiently unambiguous.这是假设您实际上正在寻找整行作为标记,但是当然,如​​果您实际上接受它作为块的开始或结束,您仍然可以使用.startswith() ,只要它足够明确。

Your idea of using a flag is fine, although naming a flag to whatever it represents is always a good idea.您使用标志的想法很好,尽管将标志命名为它所代表的任何东西总是一个好主意。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我试图在 python 中做一些代码,它读取一个文本文件并挑选出数字最大的 5 行并打印它们 - Im trying to do some code in python that reads a text file and picks out the 5 lines with the highest number and prints them 我正在尝试使用 python 从 html 网站中提取一些数据 - im trying to extract some data out of html website using python 我试图找出 .csv 文件的最大值和最小值,但我无法弄清楚我做错了什么 - Im trying to figure out the max and min of a .csv file and i cant figure out what im doing wrong 我试图从网页中使用正则表达式python获取代理 - im trying to get proxies using regex python out of a web page EOFError:用尽输入并且我试图腌制的文件不是空的 - EOFError: Ran out of input and file im trying to pickle is not empty Python - 正则表达式排除文件中的某些行 - Python - Regex to exclude some lines out of file 我正在尝试淡入淡出 - Im trying to make a fade in and out 我正在尝试这个问题,但我似乎无法让代码正常工作,在我附上了我的工作图片之后 - im trying this question but i just cant seem to get the code working, after the sentence ive attached a picture of my work 我试图写入txt.file的函数有什么问题? - whats wrong with my function that im trying to write to a txt.file? 我试图从文本文件中的一系列行中进行字符串搜索,但字符串搜索输出为空 - Im trying to string search from a range of lines in a text file but the stringsearch outputs empty
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM