简体   繁体   English

附加列表后清空输出

[英]Empty output after appending a list

r = ","
x = ""
output = list()
import string

def find_word(filepath,keyword):
    doc = open(filepath, 'r')

    for line in doc:
        #Remove all the unneccessary characters
        line = line.replace("'", r)
        line = line.replace('`', r)
        line = line.replace('[', r)
        line = line.replace(']', r)
        line = line.replace('{', r)
        line = line.replace('}', r)
        line = line.replace('(', r)
        line = line.replace(')', r)
        line = line.replace(':', r)
        line = line.replace('.', r)
        line = line.replace('!', r)
        line = line.replace('?', r)
        line = line.replace('"', r)
        line = line.replace(';', r)
        line = line.replace(' ', r)
        line = line.replace(',,', r)
        line = line.replace(',,,', r)
        line = line.replace(',,,,', r)
        line = line.replace(',,,,,', r)
        line = line.replace(',,,,,,', r)
        line = line.replace(',,,,,,,', r)
        line = line.replace('#', r)
        line = line.replace('*', r)
        line = line.replace('**', r)
        line = line.replace('***', r)

        #Make the line lowercase
        line = line.lower()

        #Split the line after every r (comma) and name the result "word"
        words = line.split(r)

        #if the keyword (also in lowercase form) appears in the before created words list
        #then append the list output by the whole line in which the keyword appears

        if keyword.lower() in words:
            output.append(line)

    return output

print find_word("pg844.txt","and")

The goal of this piece of code is to search through a text file for a certain keyword, say "and", then put the whole line in which the keyword is found into a list of type (int,string). 这段代码的目标是在文本文件中搜索某个关键字,比如“和”,然后将找到关键字的整行放入类型(int,string)的列表中。 The int should be the line number and the string the above mentioned rest whole line. int应该是行号和上面提到的整个行的字符串。

I'm still working on the line numbering - so no question concerning that yet. 我还在编写行号 - 所以还没有问题。 But the problem is: The output is empty. 但问题是:输出是空的。 Even if I append a random string instead of the line, I don't get any results. 即使我附加一个随机字符串而不是该行,我也没有得到任何结果。

If I use 如果我使用

if keyword.lower() in words:
        print line

I get all the desired lines, in which the keyword occurs. 我得到所有想要的行,其中出现关键字。 But I just can't get it into the output list. 但我无法将其纳入输出列表。

The text file I'm trying to search through: http://www.gutenberg.org/cache/epub/844/pg844.txt 我试图搜索的文本文件: http//www.gutenberg.org/cache/epub/844/pg844.txt

Please use Regex. 请使用正则表达式。 See some documentation for Regex in Python . 请参阅Python中的Regex文档。 Replacing every character/character set is confusing. 替换每个字符/字符集都令人困惑。 The use of lists and .append() looks correct, but perhaps look into debugging your line variable within the for-loop, printing it occasionally to insure its value is what you want it to be. 列表和.append()看起来是正确的,但也许可以考虑在for-loop中调试你的line变量,偶尔打印它以确保它的值是你想要的。

An answer by pyInProgress makes a good point about global variables, though without testing it, I'm not convinced it's required if the output return variable is used instead of the global output variable. pyInProgress的答案对全局变量提出了一个很好的观点,虽然没有测试它,但我不相信如果使用output返回变量而不是全局output变量则需要它。 See this StackOverflow post if you need more information about global variables. 如果您需要有关全局变量的更多信息,请参阅此StackOverflow帖子

Loop through string.punctuation to remove everything before iterating through the lines 循环遍历string.punctuation以在遍历行之前删除所有内容

import string, re

r = ','

def find_word(filepath, keyword):

    output = []
    with open(filepath, 'rb') as f:
        data = f.read()
        for x in list(string.punctuation):
            if x != r:
                data = data.replace(x, '')
        data = re.sub(r',{2,}', r, data, re.M).splitlines()

    for i, line in enumerate(data):
        if keyword.lower() in line.lower().split(r):
            output.append((i, line))
    return output

print find_word('pg844.txt', 'and')

Since output = list() is at the top-level of your code and isn't inside a function, it is considered a global variable. 由于output = list()位于代码的顶层而不在函数内部,因此它被视为全局变量。 To edit a global variable within a function, you must use the global keyword first. 要编辑函数中的全局变量,必须首先使用global关键字。

Example: 例:

gVar = 10

def editVar():
    global gVar
    gVar += 5

So to edit the variable output within your function find_word() you must type global output before assigning it values. 因此,要在函数find_word()编辑变量output ,必须在为其赋值之前键入global output

It should look like this: 它应该如下所示:

r = ","
x = ""
output = list()
import string

def find_word(filepath,keyword):
    doc = open(filepath, 'r')

    for line in doc:
        #Remove all the unneccessary characters
        line = line.replace("'", r)
        line = line.replace('`', r)
        line = line.replace('[', r)
        line = line.replace(']', r)
        line = line.replace('{', r)
        line = line.replace('}', r)
        line = line.replace('(', r)
        line = line.replace(')', r)
        line = line.replace(':', r)
        line = line.replace('.', r)
        line = line.replace('!', r)
        line = line.replace('?', r)
        line = line.replace('"', r)
        line = line.replace(';', r)
        line = line.replace(' ', r)
        line = line.replace(',,', r)
        line = line.replace(',,,', r)
        line = line.replace(',,,,', r)
        line = line.replace(',,,,,', r)
        line = line.replace(',,,,,,', r)
        line = line.replace(',,,,,,,', r)
        line = line.replace('#', r)
        line = line.replace('*', r)
        line = line.replace('**', r)
        line = line.replace('***', r)

        #Make the line lowercase
        line = line.lower()

        #Split the line after every r (comma) and name the result "word"
        words = line.split(r)

        #if the keyword (also in lowercase form) appears in the before created words list
        #then append the list output by the whole line in which the keyword appears

        global output
        if keyword.lower() in words:
            output.append(line)

    return output

In the future, try to stay away from global variables unless you absolutely need them. 在将来,除非你绝对需要,否则尽量远离全局变量。 They can get messy! 他们会变得凌乱!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM