[英]How can I create a dictionary that contains words from a text as keys and the "sublist in which it appears " as values?
My question is quite similar to others but here my list is kind of special.我的问题与其他人非常相似,但这里我的清单有点特别。
I have to create a search engine in Python.我必须用 Python 创建一个搜索引擎。 For that, I have to create a dictionary as I said in the title.
为此,我必须像标题中所说的那样创建一个字典。
Let me give you the context:让我给你上下文:
I have basically a text which is made of several parts separated by "[==========]".我基本上有一个由“[==========]”分隔的几个部分组成的文本。
Like :喜欢 :
[blablabla][blabliblou]
[==========]
[blablablou][blibloubla]
[=========]
[oubabababa][baboulila]
I created an algorithm that combine these lists until we "hit" a "=========="and put them into a single list where [blablabla blabliblou] is list[O], [blablablou][blibloubla] is list[1] etc...我创建了一个组合这些列表的算法,直到我们“击中”一个“==========”并将它们放入一个列表中,其中 [blablabla blabliblou] 是 list[O], [blablablou][blibloubla]是列表 [1] 等...
The algorithm :算法:
import re
file = open("mytext.txt","r",encoding="utf-8")
list = []
dico = {}
d = file.read()
x = re.split(r"=+", d)
for i in range(len(x)):
liste.append(x[i])
I have an output like :我有一个输出,如:
[ [blablabla blabliblou] [blablablou blibloubla] [oubabababa baboulila] ]
But now the second step is to create a dictionary that has all the words of the text as key and the sublist(s) that contain them as value(s).但是现在第二步是创建一个字典,该字典将文本中的所有单词作为键,并将包含它们的子列表作为值。
I tried to use a conditional loop as the following :我尝试使用条件循环如下:
import re
file = open("mytext.txt","r",encoding="utf-8")
list = []
numd = 0
dico = {}
d = file.read()
for x in file:
x = re.split(r"=+", d)
for i in range(len(x)):
list.append(x[i])
numd =+ 1
for word in list:
if word in dico:
if numd not in dico[word]:
dico[word].append(numd)
else:
dico[word] = [numd]
The expected output is :预期的输出是:
{blablabla:1, blablilou:1, blablablou:2, blibloubla:2, oubabababa:3,
baboulila:3}
but my list is still empty.但我的清单仍然是空的。
Thank you in advance for your reply!预先感谢您的回复! I would be so grateful
我会很感激
How about this?这个怎么样?
from collections import defaultdict
all_dict = defaultdict(list)
for index, val in enumerate(x):
for value in val:
if value not in all_dict:
all_dict[value].append(index)
print(all_dict)
It will get you the expected output:它将为您提供预期的输出:
defaultdict(list,
{'blablabla': [0],
'blabliblou': [0],
'blablablou': [1],
'blibloubla': [1],
'oubabababa': [2],
'baboulila': [2]})
from collections import defaultdict
l = [ ["blablabla", "blabliblou"], ["blablablou", "blibloubla"], ["oubabababa", "baboulila"] ]
d = defaultdict(list)
for i, line in enumerate(l):
[d[word].append(i) for word in line]
print(dict(d))
>>> {'blablabla': [0], 'oubabababa': [2], 'blablablou': [1], 'blabliblou': [0], 'baboulila': [2], 'blibloubla': [1]}
This is the code that I have so far :这是我到目前为止的代码:
import re
from collections import defaultdict
file = open("mytext.txt","r",encoding="utf-8")
l = []
d = file.read()
x = re.split(r"=+", d)
for i in range(len(x)):
l.append(x[i])
d = defaultdict(list)
for i, line in enumerate(l):
[d[word].append(i) for word in line]
It seems to work but the keys are the letters and the values are the sublists where the letter occur它似乎有效,但键是字母,值是字母出现的子列表
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.