简体   繁体   English

为什么我收到空字典?

[英]Why am i getting an empty dictionary?

I am learning python from an introductory Python textbook and I am stuck on the following problem: 我正在从介绍性的Python教科书中学习python,但遇到以下问题:

You will implement function index() that takes as input the name of a text file and a list of words. 您将实现函数index(),该函数将文本文件的名称和单词列表作为输入。 For every word in the list, your function will find the lines in the text file where the word occurs and print the corresponding line numbers. 对于列表中的每个单词,您的函数将在文本文件中找到单词所在的行,并打印相应的行号。

Ex: 例如:

 >>>> index('raven.txt', ['raven', 'mortal', 'dying', 'ghost', 'ghastly', 'evil', 'demon'])

 ghost     9 
 dying     9 
 demon     122
 evil      99, 106
 ghastly   82
 mortal    30 
 raven     44, 53, 55, 64, 78, 97, 104, 111, 118, 120

Here is my attempt at the problem: 这是我尝试解决的问题:

def index(filename, lst):
    infile = open(filename, 'r')
    lines =  infile.readlines()
    lst = []
    dic = {}
    for line in lines:
        words = line.split()
        lst. append(words)
    for i in range(len(lst)):
        for j in range(len(lst[i])):
            if lst[i][j] in lst:
                dic[lst[i][j]] = i 
    return dic

When I run the function, I get back an empty dictionary. 运行该函数时,我会得到一个空字典。 I do not understand why I am getting an empty dictionary. 我不明白为什么我会得到一个空字典。 So what is wrong with my function? 那么我的功能出了什么问题? Thanks. 谢谢。

Try this, 尝试这个,

def index(filename, lst):
    dic = {w:[] for w in lst}
    for n,line in enumerate( open(filename,'r') ):
        for word in lst:
            if word in line.split(' '):
                dic[word].append(n+1)
    return dic

There are some features of the language introduced here that you should be aware of because they will make life a lot easier in the long run. 您应该了解此处介绍的语言的某些功能,因为从长远来看,它们会使生活变得更轻松。

The first is a dictionary comprehension. 首先是字典理解。 It basically initializes a dictionary using the words in lst as keys and an empty list [] as the value for each key. 它基本上使用lst的单词作为键,并使用空列表[]作为每个键的值来初始化字典。

Next the enumerate command. 接下来的enumerate命令。 This allows us to iterate over the items in a sequence but also gives us the index of those items. 这使我们可以按顺序遍历项目,但也可以为我们提供这些项目的索引。 In this case, because we passed a file object to enumerate it will loop over the lines. 在这种情况下,因为我们传递了一个文件对象来enumerate ,它将循环遍历所有行。 For each iteration, n will be the 0-based index of the line and line will be the line itself. 对于每次迭代, n将是该行的从0开始的索引,而line将是该行本身。 Next we iterate over the words in lst . 接下来,我们遍历lst的单词。

Notice that we don't need any indices here. 请注意,我们在这里不需要任何索引。 Python encourages looping over objects in sequences rather than looping over indices and then accessing the objects in a sequence based on index (for example discourages doing for i in range(len(lst)): do something with lst[i]) . Python鼓励循环遍历序列中的对象,而不是循环遍历索引,然后基于索引访问序列中的对象(例如,不鼓励for i in range(len(lst)): do something with lst[i])

Finally, the in operator is a very straightforward way to test membership for many types of objects and the syntax is very intuitive. 最后, in运算符是测试许多类型对象的成员资格的一种非常简单的方法,其语法非常直观。 In this case, we are asking is the current word from lst in the current line . 在这种情况下,我们要问的是当前line lst的当前单词。

Note that we use line.split(' ') to get a list of the words in the line. 请注意,我们使用line.split(' ')获取该行中单词的列表。 If we don't do this, 'the' in 'there was a ghost' would return True as the is a substring of one of the words. 如果我们不这样做, 'the' in 'there was a ghost'将返回True因为the是其中一个单词的子字符串。

On the other hand 'the' in ['there', 'was', 'a', 'ghost'] would return False . 另一方面'the' in ['there', 'was', 'a', 'ghost']将返回False If the conditional returns True , we append it to the list associated to the key in our dictionary. 如果条件返回True ,则将其附加到与字典中的键关联的列表中。

That might be a lot to chew on, but these concepts make problems like this more straight forward. 这可能要花费很多,但是这些概念使诸如此类的问题更加直接。

You are overwriting the value of lst . 您正在覆盖lst的值。 You use it as both a parameter to a function (in which case it is a list of strings) and as the list of words in the file (in which case it's a list of list of strings). 您既可以将它用作函数的参数(在这种情况下,它是字符串列表),又可以在文件中用作单词的列表(在此情况下,它是字符串列表)。 When you do: 当您这样做时:

if lst[i][j] in lst

The comparison always returns False because lst[i][j] is a str , but lst contains only lists of strings, not strings themselves. 由于lst[i][j]str ,所以比较总是返回False ,但是lst仅包含字符串列表,而不包含字符串本身。 This means that the assignment to the dic is never executed and you get an empty dict as result. 这意味着永远不会执行对dic的赋值,结果是空dict

To avoid this you should use a different name for the list in which you store the words, for example: 为了避免这种情况,您应该为存储单词的列表使用其他名称,例如:

In [4]: !echo 'a b c\nd e f' > test.txt

In [5]: def index(filename, lst):
   ...:     infile = open(filename, 'r')
   ...:     lines =  infile.readlines()
   ...:     words = []
   ...:     dic = {}
   ...:     for line in lines:
   ...:         line_words = line.split()
   ...:         words.append(line_words)
   ...:     for i in range(len(words)):
   ...:         for j in range(len(words[i])):
   ...:             if words[i][j] in lst:
   ...:                 dic[words[i][j]] = i 
   ...:     return dic
   ...: 

In [6]: index('test.txt', ['a', 'b', 'c'])
Out[6]: {'a': 0, 'c': 0, 'b': 0}

There are also a lot of things you can change. 您还可以更改很多事情。

When you want to iterate a list you don't have to explicitly use indexes. 当您要遍历列表时,不必显式使用索引。 If you need the index you can use enumerate : 如果需要索引,可以使用enumerate

    for i, line_words in enumerate(words):
        for word in line_words:
            if word in lst: dict[word] = i

You can also iterate directly on a file (refer to Reading and Writing Files section of the python tutorial for a bit more information): 您还可以直接在文件上进行迭代(有关更多信息,请参阅python教程的“ 读写文件”部分):

# use the with statement to make sure that the file gets closed
with open('test.txt') as infile:
    for i, line in enumerate(infile):
        print('Line {}: {}'.format(i, line))

In fact I don't see why would you first build that words list of list. 实际上,我不明白您为什么首先要建立列表的words列表。 Just itertate on the file directly while building the dictionary: 只需在构建字典时直接对文件进行迭代:

def index(filename, lst):
    with open(filename, 'r') as infile:
        dic = {}
        for i, line in enumerate(infile):
            for word in line.split():
                if word in lst:
                    dic[word] = i 
    return dic

Your dic values should be lists, since more than one line can contain the same word. 您的dic值应为列表,因为多行可以包含相同的单词。 As it stands your dic would only store the last line where a word is found: 就目前而言,您的dic仅会存储找到单词的最后一行:

from collections import defaultdict

def index(filename, words):
    # make faster the in check afterwards
    words = frozenset(words)  
    with open(filename) as infile:
        dic = defaultdict(list)
        for i, line in enumerate(infile):
            for word in line.split():
                if word in words:
                    dic[word].append(i)
    return dic

If you don't want to use the collections.defaultdict you can replace dic = defaultdict(list) with dic = {} and then change the: 如果您不想使用collections.defaultdict ,则可以将dic = defaultdict(list)替换为dic = {} ,然后更改:

dic[word].append(i)

With: 带有:

if word in dic:
    dic[word] = [i]
else:
    dic[word].append(i)

Or, alternatively, you can use dict.setdefault : 或者,您也可以使用dict.setdefault

dic.setdefault(word, []).append(i)

although this last way is a bit slower than the original code. 尽管这最后一种方法比原始代码要慢一些。

Note that all these solutions have the property that if a word isn't found in the file it will not appear in the result at all. 请注意,所有这些解决方案都具有以下属性:如果在文件中找不到单词,则根本不会出现在结果中。 However you may want it in the result, with an emty list as value. 但是,您可能需要在结果中使用空列表作为值。 In such a case it's simpler the dict with empty lists before starting to loop, such as in: 在这种情况下,在开始循环之前使用空列表的dict更简单,例如:

dic = {word : [] for word in words}
for i, line in enumerate(infile):
    for word in line.split():
        if word in words:
            dic[word].append(i)

Refer to the documentation about List Comprehensions and Dictionaries to understand the first line. 请参阅有关列表理解字典的文档以了解第一行。

You can also iterate over words instead of the line, like this: 您还可以遍历words而不是行,如下所示:

dic = {word : [] for word in words}
for i, line in enumerate(infile):
    for word in words:
        if word in line.split():
            dic[word].append(i)

Note however that this is going to be slower because: 但是请注意,这将变慢,因为:

  • line.split() returns a list, so word in line.split() will have to scan all the list. line.split()返回一个列表,因此word in line.split()将必须扫描所有列表。
  • You are repeating the computation of line.split() . 您正在重复line.split()的计算。

You can try to solve these two problems doing: 您可以尝试解决以下两个问题:

dic = {word : [] for word in words}
for i, line in enumerate(infile):
    line_words = frozenset(line.split())
    for word in words:
        if word in line_words:
            dic[word].append(i)

Note that here we are iterating once over line.split() to build the set and also over words . 请注意,这里我们遍历line.split()一次以构建集合,也遍历words Depending on the sizes of the two sets this may be slower or faster than the original version (iteratinv over line.split() ). 根据这两个集合的大小,它可能比原始版本(在line.split()上的line.split()慢或快。

However at this point it's probably faster to intersect the sets: 但是,此时将集合相交可能更快:

dic = {word : [] for word in words}
for i, line in enumerate(infile):
    line_words = frozenset(line.split())
    for word in words & line_words:  # & stands for set intersection
        dic[word].append(i)

First, your function param with the words is named lst and also the list where you put all the words in the file is also named lst , so you are not saving the words passed to your functions, because on line 4 you're redeclaring the list. 首先,将带有单词的函数param命名为lst ,并将所有单词放入文件的列表也命名为lst ,因此您不会保存传递给函数的单词,因为在第4行中,您需要重新声明清单。

Second, You are iterating over each line in the file (the first for ), and getting the words in that line. 其次,你迭代文件中的每一行(第一for ),并获得词语的那条线。 After that lst has all the words in the entire file. 之后, lst将所有单词包含在整个文件中。 So in the for i ... you are iterating over all the words readed from the file, there's no need to use the third for j where you are iterating over each character in every word. 因此,在for i ...您正在遍历从文件中读取的所有单词,因此无需遍历for j中的第三个单词,因为您要遍历每个单词中的每个字符。

In resume, in that if you are saying " If this single character is in the lists of words ... " wich is not, so the dict will be never filled up. 在简历中, if您说的是“ 如果该单词在单词列表中…… ”,则不会,因此该字典将永远不会被填充。

for i in range(len(lst)):
  if words[i] in lst:
    dic[words[i]] = dic[words[i]] + i  # To count repetitions

You need to rethink the problem, even my answer will fail because the word in the dict will not exist giving an error, but you get the point. 您需要重新考虑问题,即使我的回答也将失败,因为字典中的单词将不存在并给出错误,但是您明白了。 Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM