[英]Text Mining: Query search
I have a dictionary: 我有一本字典:
{'Farage': [0, 5, 9, 192,233,341],
'EU': [0, 1, 5, 6, 9, 23]}
Query1: “Farage” and “EU”
Query2: “Farage” or “EU”
I need to return the documents that contain these queries. 我需要返回包含这些查询的文档。 For query1, for example, the answer should be [0,5,9].
例如,对于query1,答案应为[0,5,9]。 I believe the answer should be something like that but in python:
我相信答案应该是这样的,但在python中:
final_list = []
while x≠Null and y≠Null
do if docID(x)=docID(y)
then ADD(final_list, docID(x))
x← next(x)
y ←next(y)
else if docID(x) < docID(y)
then x← next(x)
else y ←next(y)
return final_list
Please help. 请帮忙。
You could create your own function using sets
, a structure that Python provides and works best for your case by speeding up the process of joining and intersecting sequences of elements: 您可以使用
sets
创建一个自己的函数, sets
是Python提供的结构,可以通过加快元素序列的连接和相交过程来最适合您的情况:
def getResults(s, argument):
s = list(s.values())
if argument == 'OR':
result = s[0]
for elem in s[1:]:
result = sorted(set(result).union(set(elem)))
return result
elif argument == 'AND':
result = s[0]
for elem in s[1:]:
result = sorted(set(result).intersection(set(elem)))
return result
else:
return None
inDict = {'Farage': [0, 5, 9, 192,233,341], 'EU': [0, 1, 5, 6, 9, 23]}
query1 = getResults(inDict, 'AND')
query2 = getResults(inDict, 'OR')
print(query1)
print(query2)
Results: 结果:
[0, 5, 9]
[0, 1, 5, 6, 9, 23, 192, 233, 341]
Note: You can remove the sorted
function if you do not want any sorting. 注意:如果不想进行任何排序,则可以删除
sorted
函数。
You can create a dict
of operators and throw set
operations to get the final results. 您可以创建一个运算符
dict
并抛出set
操作以获得最终结果。 It assumes that queries follow strict rule of key1 operator key2 operator key3
假定查询遵循
key1 operator key2 operator key3
严格规则
For arbitrary number of arguments 对于任意数量的参数
import operator
d1={'Farage': [0, 5, 9, 192,233,341],
'EU': [0, 1, 5, 6, 9, 23],
'hopeless': [0, 341, 19999]}
d={'and':operator.and_,
'or':operator.or_}
Queries= ['Farage and EU','Farage and EU or hopeless','Farage or EU']
for query in Queries:
res=set()
temp_arr = query.split()
k1 = temp_arr[0]
for value in range(1,len(temp_arr),2):
op = temp_arr[value]
k2 = temp_arr[value+1]
if res:
res = d[op](res, set(d1.get(k2, [])))
else:
res = d[op](set(d1.get(k1, [])), set(d1.get(k2, [])))
print(res)
Output 产量
set([0, 9, 5])
set([0, 192, 5, 233, 9, 19999, 341])
set([0, 192, 5, 6, 1, 233, 23, 341, 9])
Bare in mind, use the conversion into sets: 切记,使用转换成组:
>>> d = {'Farage': [0, 5, 9, 192, 233, 341] , 'EU': [0, 1, 5, 6, 9, 23]}
>>> d
{'EU': [0, 1, 5, 6, 9, 23], 'Farage': [0, 5, 9, 192, 233, 341]}
>>>
>>> set(d['EU']) | set(d['Farage'])
{0, 1, 192, 5, 6, 9, 233, 341, 23}
>>>
>>> set(d['EU']) & set(d['Farage'])
{0, 9, 5}
>>>
>>> set(d['EU']) ^ set(d['Farage'])
{192, 1, 23, 233, 341, 6}
>>>
>>> set(d['EU']) - set(d['Farage'])
{1, 6, 23}
Or change the format of the input if it is possible for the dictionary to be directly in the form of the set, that is: 或者,如果字典可以直接以集合的形式出现,则更改输入的格式,即:
>>> d = {'Farage': {0, 5, 9, 192, 233, 341}, 'EU': {0, 1, 5, 6, 9, 23}}
>>> d['EU'] & d['Farage']
{0, 9, 5}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.