[英]Why am i getting an empty dictionary?
I am learning python from an introductory Python textbook and I am stuck on the following problem: 我正在从介绍性的Python教科书中学习python,但遇到以下问题:
You will implement function index() that takes as input the name of a text file and a list of words. 您将实现函数index(),该函数将文本文件的名称和单词列表作为输入。 For every word in the list, your function will find the lines in the text file where the word occurs and print the corresponding line numbers. 对于列表中的每个单词,您的函数将在文本文件中找到单词所在的行,并打印相应的行号。
Ex: 例如:
>>>> index('raven.txt', ['raven', 'mortal', 'dying', 'ghost', 'ghastly', 'evil', 'demon'])
ghost 9
dying 9
demon 122
evil 99, 106
ghastly 82
mortal 30
raven 44, 53, 55, 64, 78, 97, 104, 111, 118, 120
Here is my attempt at the problem: 这是我尝试解决的问题:
def index(filename, lst):
infile = open(filename, 'r')
lines = infile.readlines()
lst = []
dic = {}
for line in lines:
words = line.split()
lst. append(words)
for i in range(len(lst)):
for j in range(len(lst[i])):
if lst[i][j] in lst:
dic[lst[i][j]] = i
return dic
When I run the function, I get back an empty dictionary. 运行该函数时,我会得到一个空字典。 I do not understand why I am getting an empty dictionary. 我不明白为什么我会得到一个空字典。 So what is wrong with my function? 那么我的功能出了什么问题? Thanks. 谢谢。
Try this, 尝试这个,
def index(filename, lst):
dic = {w:[] for w in lst}
for n,line in enumerate( open(filename,'r') ):
for word in lst:
if word in line.split(' '):
dic[word].append(n+1)
return dic
There are some features of the language introduced here that you should be aware of because they will make life a lot easier in the long run. 您应该了解此处介绍的语言的某些功能,因为从长远来看,它们会使生活变得更轻松。
The first is a dictionary comprehension. 首先是字典理解。 It basically initializes a dictionary using the words in lst
as keys and an empty list []
as the value for each key. 它基本上使用lst
的单词作为键,并使用空列表[]
作为每个键的值来初始化字典。
Next the enumerate
command. 接下来的enumerate
命令。 This allows us to iterate over the items in a sequence but also gives us the index of those items. 这使我们可以按顺序遍历项目,但也可以为我们提供这些项目的索引。 In this case, because we passed a file object to enumerate
it will loop over the lines. 在这种情况下,因为我们传递了一个文件对象来enumerate
,它将循环遍历所有行。 For each iteration, n
will be the 0-based index of the line and line
will be the line itself. 对于每次迭代, n
将是该行的从0开始的索引,而line
将是该行本身。 Next we iterate over the words in lst
. 接下来,我们遍历lst
的单词。
Notice that we don't need any indices here. 请注意,我们在这里不需要任何索引。 Python encourages looping over objects in sequences rather than looping over indices and then accessing the objects in a sequence based on index (for example discourages doing for i in range(len(lst)): do something with lst[i])
. Python鼓励循环遍历序列中的对象,而不是循环遍历索引,然后基于索引访问序列中的对象(例如,不鼓励for i in range(len(lst)): do something with lst[i])
。
Finally, the in
operator is a very straightforward way to test membership for many types of objects and the syntax is very intuitive. 最后, in
运算符是测试许多类型对象的成员资格的一种非常简单的方法,其语法非常直观。 In this case, we are asking is the current word from lst
in the current line
. 在这种情况下,我们要问的是当前line
lst
的当前单词。
Note that we use line.split(' ')
to get a list of the words in the line. 请注意,我们使用line.split(' ')
获取该行中单词的列表。 If we don't do this, 'the' in 'there was a ghost'
would return True
as the
is a substring of one of the words. 如果我们不这样做, 'the' in 'there was a ghost'
将返回True
因为the
是其中一个单词的子字符串。
On the other hand 'the' in ['there', 'was', 'a', 'ghost']
would return False
. 另一方面'the' in ['there', 'was', 'a', 'ghost']
将返回False
。 If the conditional returns True
, we append it to the list associated to the key in our dictionary. 如果条件返回True
,则将其附加到与字典中的键关联的列表中。
That might be a lot to chew on, but these concepts make problems like this more straight forward. 这可能要花费很多,但是这些概念使诸如此类的问题更加直接。
You are overwriting the value of lst
. 您正在覆盖lst
的值。 You use it as both a parameter to a function (in which case it is a list of strings) and as the list of words in the file (in which case it's a list of list of strings). 您既可以将它用作函数的参数(在这种情况下,它是字符串列表),又可以在文件中用作单词的列表(在此情况下,它是字符串列表)。 When you do: 当您这样做时:
if lst[i][j] in lst
The comparison always returns False
because lst[i][j]
is a str
, but lst
contains only lists of strings, not strings themselves. 由于lst[i][j]
是str
,所以比较总是返回False
,但是lst
仅包含字符串列表,而不包含字符串本身。 This means that the assignment to the dic
is never executed and you get an empty dict
as result. 这意味着永远不会执行对dic
的赋值,结果是空dict
。
To avoid this you should use a different name for the list in which you store the words, for example: 为了避免这种情况,您应该为存储单词的列表使用其他名称,例如:
In [4]: !echo 'a b c\nd e f' > test.txt
In [5]: def index(filename, lst):
...: infile = open(filename, 'r')
...: lines = infile.readlines()
...: words = []
...: dic = {}
...: for line in lines:
...: line_words = line.split()
...: words.append(line_words)
...: for i in range(len(words)):
...: for j in range(len(words[i])):
...: if words[i][j] in lst:
...: dic[words[i][j]] = i
...: return dic
...:
In [6]: index('test.txt', ['a', 'b', 'c'])
Out[6]: {'a': 0, 'c': 0, 'b': 0}
There are also a lot of things you can change. 您还可以更改很多事情。
When you want to iterate a list you don't have to explicitly use indexes. 当您要遍历列表时,不必显式使用索引。 If you need the index you can use enumerate
: 如果需要索引,可以使用enumerate
:
for i, line_words in enumerate(words):
for word in line_words:
if word in lst: dict[word] = i
You can also iterate directly on a file (refer to Reading and Writing Files section of the python tutorial for a bit more information): 您还可以直接在文件上进行迭代(有关更多信息,请参阅python教程的“ 读写文件”部分):
# use the with statement to make sure that the file gets closed
with open('test.txt') as infile:
for i, line in enumerate(infile):
print('Line {}: {}'.format(i, line))
In fact I don't see why would you first build that words
list of list. 实际上,我不明白您为什么首先要建立列表的words
列表。 Just itertate on the file directly while building the dictionary: 只需在构建字典时直接对文件进行迭代:
def index(filename, lst):
with open(filename, 'r') as infile:
dic = {}
for i, line in enumerate(infile):
for word in line.split():
if word in lst:
dic[word] = i
return dic
Your dic
values should be lists, since more than one line can contain the same word. 您的dic
值应为列表,因为多行可以包含相同的单词。 As it stands your dic
would only store the last line where a word is found: 就目前而言,您的dic
仅会存储找到单词的最后一行:
from collections import defaultdict
def index(filename, words):
# make faster the in check afterwards
words = frozenset(words)
with open(filename) as infile:
dic = defaultdict(list)
for i, line in enumerate(infile):
for word in line.split():
if word in words:
dic[word].append(i)
return dic
If you don't want to use the collections.defaultdict
you can replace dic = defaultdict(list)
with dic = {}
and then change the: 如果您不想使用collections.defaultdict
,则可以将dic = defaultdict(list)
替换为dic = {}
,然后更改:
dic[word].append(i)
With: 带有:
if word in dic:
dic[word] = [i]
else:
dic[word].append(i)
Or, alternatively, you can use dict.setdefault
: 或者,您也可以使用dict.setdefault
:
dic.setdefault(word, []).append(i)
although this last way is a bit slower than the original code. 尽管这最后一种方法比原始代码要慢一些。
Note that all these solutions have the property that if a word isn't found in the file it will not appear in the result at all. 请注意,所有这些解决方案都具有以下属性:如果在文件中找不到单词,则根本不会出现在结果中。 However you may want it in the result, with an emty list as value. 但是,您可能需要在结果中使用空列表作为值。 In such a case it's simpler the dict
with empty lists before starting to loop, such as in: 在这种情况下,在开始循环之前使用空列表的dict
更简单,例如:
dic = {word : [] for word in words}
for i, line in enumerate(infile):
for word in line.split():
if word in words:
dic[word].append(i)
Refer to the documentation about List Comprehensions and Dictionaries to understand the first line. 请参阅有关列表理解和字典的文档以了解第一行。
You can also iterate over words
instead of the line, like this: 您还可以遍历words
而不是行,如下所示:
dic = {word : [] for word in words}
for i, line in enumerate(infile):
for word in words:
if word in line.split():
dic[word].append(i)
Note however that this is going to be slower because: 但是请注意,这将变慢,因为:
line.split()
returns a list, so word in line.split()
will have to scan all the list. line.split()
返回一个列表,因此word in line.split()
将必须扫描所有列表。 line.split()
. 您正在重复line.split()
的计算。 You can try to solve these two problems doing: 您可以尝试解决以下两个问题:
dic = {word : [] for word in words}
for i, line in enumerate(infile):
line_words = frozenset(line.split())
for word in words:
if word in line_words:
dic[word].append(i)
Note that here we are iterating once over line.split()
to build the set and also over words
. 请注意,这里我们遍历line.split()
一次以构建集合,也遍历words
。 Depending on the sizes of the two sets this may be slower or faster than the original version (iteratinv over line.split()
). 根据这两个集合的大小,它可能比原始版本(在line.split()
上的line.split()
慢或快。
However at this point it's probably faster to intersect the sets: 但是,此时将集合相交可能更快:
dic = {word : [] for word in words}
for i, line in enumerate(infile):
line_words = frozenset(line.split())
for word in words & line_words: # & stands for set intersection
dic[word].append(i)
First, your function param with the words is named lst
and also the list where you put all the words in the file is also named lst
, so you are not saving the words passed to your functions, because on line 4 you're redeclaring the list. 首先,将带有单词的函数param命名为lst
,并将所有单词放入文件的列表也命名为lst
,因此您不会保存传递给函数的单词,因为在第4行中,您需要重新声明清单。
Second, You are iterating over each line in the file (the first for
), and getting the words in that line. 其次,你迭代文件中的每一行(第一for
),并获得词语的那条线。 After that lst
has all the words in the entire file. 之后, lst
将所有单词包含在整个文件中。 So in the for i ...
you are iterating over all the words readed from the file, there's no need to use the third for j
where you are iterating over each character in every word. 因此,在for i ...
您正在遍历从文件中读取的所有单词,因此无需遍历for j
中的第三个单词,因为您要遍历每个单词中的每个字符。
In resume, in that if
you are saying " If this single character is in the lists of words ... " wich is not, so the dict will be never filled up. 在简历中, if
您说的是“ 如果该单词在单词列表中…… ”,则不会,因此该字典将永远不会被填充。
for i in range(len(lst)):
if words[i] in lst:
dic[words[i]] = dic[words[i]] + i # To count repetitions
You need to rethink the problem, even my answer will fail because the word in the dict will not exist giving an error, but you get the point. 您需要重新考虑问题,即使我的回答也将失败,因为字典中的单词将不存在并给出错误,但是您明白了。 Good luck! 祝好运!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.