简体   繁体   English

Python评估和逻辑运算符

[英]Python eval and logical operators

I have a JSON "database" - it's a python list of JSON objects: 我有一个JSON“数据库”-这是JSON对象的python列表:

[{'_id': 'TRANSACTION0', 'Offer': {'From': 'merchant1', 'To': 'customer1', 'Item': 'Car', 'Price': 1000, 'Timestamp': 2}, 'Accept': {'Quantity': 1, 'Address': '123 Fake Street', 'Timestamp': 5}},
{'_id': 'TRANSACTION1', 'Offer': {'From': 'merchant1', 'To': 'customer2', 'Item': 'Computer', 'Price': 500, 'Timestamp': 5}},
{'_id': 'TRANSACTION2', 'Offer': {'From': 'merchant3', 'To': 'customer3', 'Item': 'Garbage bin', 'Price': 10, 'Timestamp': 0}, 'Accept': {'Quantity': 2, 'Address': '456 MadeUp Road', 'Timestamp': 1}},
{'_id': 'TRANSACTION3', 'Offer': {'From': 'merchant2', 'To': 'customer1', 'Item': 'Car', 'Price': 2000, 'Timestamp': 3}, 'Accept': {'Quantity': 2, 'Address': 'The White House', 'Timestamp': 3}},
{'_id': 'TRANSACTION4', 'Offer': {'From': 'merchant3', 'To': 'customer3', 'Item': 'Pens', 'Price': 2, 'Timestamp': 0}, 'Accept': {'Quantity': 4, 'Address': 'Houses of Parliment', 'Timestamp': 1}},
{'_id': 'TRANSACTION5', 'Offer': {'From': 'merchant4', 'To': 'customer1', 'Item': 'Headphones', 'Price': 200, 'Timestamp': 4}},
{'_id': 'TRANSACTION6', 'Offer': {'From': 'merchant1', 'To': 'customer2', 'Item': 'Water Bottle', 'Price': 1, 'Timestamp': 1}, 'Accept': {'Quantity': 3, 'Address': 'Timbuktu', 'Timestamp': 14}},
{'_id': 'TRANSACTION7', 'Offer': {'From': 'merchant2', 'To': 'customer3', 'Item': 'Laptop', 'Price': 900, 'Timestamp': 0}},
{'_id': 'TRANSACTION8', 'Offer': {'From': 'merchant4', 'To': 'customer1', 'Item': 'Chair', 'Price': 80, 'Timestamp': 3}, 'Accept': {'Quantity': 1, 'Address': 'Mordor', 'Timestamp': 3}},
{'_id': 'TRANSACTION9', 'Offer': {'From': 'merchant3', 'To': 'customer3', 'Item': 'Garbage bin', 'Price': 5, 'Timestamp': 2}, 'Accept': {'Quantity': 2, 'Address': 'The wall', 'Timestamp': 2}}]

My intention is to use queries, which will be stored in dictionaries, against this database. 我的意图是针对该数据库使用查询,该查询将存储在词典中。 In this example, the dictionary contains: 在此示例中,词典包含:

a_dict = {"query1": "'Offer' and 'Accept'"}

Note that the dictionary will contain more queries, and also more complicated queries (eg (cond1 and cond2) or (cond2 and cond3) ), but I need to understand why Python is doing what it's doing (and how to overcome it) as opposed to solely what the solution is. 请注意,该字典将包含更多查询,也将包含更复杂的查询(例如(cond1 and cond2) or (cond2 and cond3) ),但是我需要了解为什么Python做它正在做的事情(以及如何克服它)只是解决方案是什么。

I need to have something which evaluates and runs query1 correctly. 我需要一些可以正确评估和运行query1东西。 My wrongful implementation is currently: 我目前的错误执行方式是:

if (eval(a_dict["query1"]) + "in i"):

This is the same as: 这与:

if 'Offer' and 'Accept' in i:

Due to short-circuiting, this evaluates to only checking whether Accept is in i . 由于短路,其评估结果是仅检查i是否存在Accept In this example, everytime there is an Accept there is an Offer , but this may not always be the case. 在此示例中,每当有一个Accept就有一个Offer ,但是并非总是如此。

A rightful if statement would be: 正确的if语句为:

if 'Offer' in i and 'Accept' in i:

However, this isn't easily composable from the type of potential queries I would have. 但是,这与我可能拥有的潜在查询的类型很难组合。 Ideally, I'd like to have an elegant solution which was "plug and play", similar to my eval if statement given above. 理想情况下,我希望有一个优雅的解决方案,即“即插即用”,类似于上面给出的eval if声明。

Is there anyway to be able take a particular query from a dictionary, plug that into an if statement, and then run that if statement as I'm intending (under the assumption that all the queries make logical sense)? 无论如何,是否能够从字典中获取特定查询,将其插入到if语句中,然后按照我的意图运行if语句(在所有查询都具有逻辑意义的假设下)?

https://www.python.org/dev/peps/pep-0308/ This article says FAQ 4.16 gives alternatives, but I can't seem to find it anywhere https://www.python.org/dev/peps/pep-0308/这篇文章说FAQ 4.16提供了替代方案,但我似乎在任何地方都找不到

Please don't use eval to do queries. 请不要使用eval进行查询。 This is guaranteed to blow up in your face when you don't expect it. 保证在您不期望的时候会炸毁您的脸。 Maybe you've heard of SQL injections; 也许您听说过SQL注入。 the security implications of using eval to build queries are huge. 使用eval构建查询的安全隐患是巨大的。

A filter-based query system 基于过滤器的查询系统

Instead, begin by writing filter functions for common queries. 相反,首先为常见查询编写过滤器功能。 This will also solve your problem and provide a "plug-and-play" way to compose queries. 这也将解决您的问题,并提供“即插即用”的方式来构成查询。

Here's a pointer as to how to implement it: 这是有关如何实现它的指针:

Think of a query as a function which takes as arguments a few literal values (and, implicitly, a set of records), and returns a resultset of records. 将查询视为一个函数,该函数将几个文字值(并隐式地包含一组记录)作为参数,然后返回记录的结果集。 Ditching the list and using the set datatype for the resultset, keyed by your record id, will increase performance a lot. 放弃列表并使用结果集的set数据类型(由您的记录ID键控)将大大提高性能。

Then an "AND" becomes a function which takes two (or more) sets of records and builds the set intersection of them, and an "OR" becomes a function which takes two (or more) sets of records and builds the union of them. 然后,“ AND”变成接受两个(或更多)记录集并建立它们的集合交集的函数,而“ OR”变成一个接受两个(或更多)记录集并建立它们的并集的函数。 。 (NOT would be the set difference between the whole set of records and one or more subsets). (不是整个记录集和一个或多个子集之间的集合差)。

If you build your functions this way, a query will become a simple tree of function calls, such as: 如果以这种方式构建函数,查询将成为简单的函数调用树,例如:

result = q_and(q_or(q_merchant_is('merchant2'), 
                    q_address_is('123 FakeStreet')), 
               q_quantity_above(3))

(Formatted for better legibility) (格式化以提高可读性)

It's not that hard to write a parser for a simple query language that will build such a query, but if you don't need to provide a frontend for endusers, you might not need your own query language because the python representation of the query as seen above is simple and clear enough. 为将构建这种查询的简单查询语言编写解析器并不难,但是如果您不需要为最终用户提供前端,则可能不需要您自己的查询语言,因为查询的python表示为上面看到的很简单清晰。 And if you do need to represent your queries as dictionaries, well, if you choose a structure that closely mimicks the final structure of the query call tree, it's trivial to write a query_builder function that turns one of your dict queries into a function that will run the tree of query function calls when called. 而且,如果您确实需要将查询表示为字典,那么,如果您选择的结构紧密模仿查询调用树的最终结构,那么编写一个query_builder函数将一个dict查询中的一个转换为一个可以在调用时运行查询函数调用树。

Note: As you can see, q_merchant_is , q_quantity_above etc don't take a set of records to filter. 注意:如您所见, q_merchant_isq_quantity_above等不接受要过滤的一组记录。 You can fix this by making a Query class and set the full set as an instance attribute, so that each query method has access to the full recordset if it needs it: 您可以通过创建Query类并将其设置为实例属性来解决此问题,以便每个查询方法都可以在需要时访问完整记录集:

class Query(object):
    def __init__(self, all_records):
        self.records = all_records

    def merchant_is(self, name):
        result = set()
        for record in self.records:
            if record['Offer']['From'] == name:
               result.add(record['_id'])
        return result

    def q_and(self, *args):
        result = args[0]
        for i in range(1, len(args)):
            result = args[i].intersection(result)
        return result
    ...

q = Query(my_full_record_set)
result = q.q_and(q.q_or(q.merchant_is('merchant2').........))    

Performance and indices 表现和指标

You see that each query function that queries for a literal value basically scans over the whole dataset to filter it. 您会看到,查询文字值的每个查询函数基本上都会扫描整个数据集以对其进行过滤。 If your query contains many such searches for literal parts, you'll scan your dataset multiple times. 如果您的查询包含许多此类搜索文字部分的内容,则将多次扫描数据集。 For large datasets, this can become prohibitive. 对于大型数据集,这可能变得令人望而却步。

A simple solution would be to index the fields you want to query against in one dict per field. 一种简单的解决方案是在每个字段的一个字典中为要查询的字段建立索引。 This would speed up the query by orders of magnitude, but if your data changed, you'd need to make sure to keep the indices up to date. 这样可以将查询速度提高几个数量级,但是如果您的数据发生更改,则需要确保索引保持最新。

Classifier query system 分类器查询系统

Another solution would be to build your query functions as classifiers instead of filters, meaning that merchant_is would take a literal value and a record and answer True or False, depending on whether the record contained that literal value in the right field. 另一种解决方案是将查询函数构建为分类器而不是过滤器,这意味着merchant_is _将采用文字值和记录并回答True或False,这取决于记录是否在正确的字段中包含该文字值。 We could make this all work efficiently by having factory functions which build a composite query. 我们可以通过具有构建复合查询的工厂函数来使所有这些工作高效地进行。

The example query from the filter section would then become: 然后,来自过滤器部分的示例查询将变为:

query = q_and(q_or(q_merchant_is('merchant2'),
                   q_address_is('123 FakeStreet')),
              q_quantity_above(3))
result = perform_query(query, all_my_records)

q_merchant_is would turn into the following: q_merchant_is将变成以下内容:

def q_merchant_is(literal):
    return lambda record: record['Orders']['From'] == literal

Note how you're returning a function that, when called with a record, will classify it. 请注意,如何返回一个函数,该函数在与记录一起调用时将对其进行分类。

q_or might look like this: q_or可能看起来像这样:

def q_or(*args):
    def or_template(record):
        for classifier in args:
            if classifier(record):
                return True
        return False
    return or_template

or a bit terser (I'm not sure whether this is more efficient or not): 还是有点麻烦(我不确定这是否更有效):

def q_or(*args):
    return lambda record: any([ classifier(record) for classifier in args])

q_or now returns a function that runs a number of classifiers against the record passed as an argument and returns True if at least one of the classifiers returns True. q_or现在返回一个函数,该函数对作为参数传递的记录运行多个分类器,如果至少一个分类器返回True,则返回True。 q_and works just like q_or except that it only returns True if every classifier returns True. q_and工作方式与q_or只是它仅在每个分类器返回True时才返回True。 And q_not would simply return True if it's classifier returned False, and vice versa. 如果分类器返回False,则q_not只会返回True,反之亦然。

Now all you need is: 现在您需要的是:

def perform_query(query, all_records):
    return filter(query, all_records)

This will only iterate over your dataset a single time and is pretty much as efficient as it can get in python without involving eval, compile and exec, but it's somewhat harder to understand than the filter approach. 这只会在您的数据集上进行一次迭代,并且效率与在python中获得的效率相当,而无需涉及eval,compile和exec,但是比filter方法更难理解。

However, this isn't easily composable from the type of potential queries I would have. 但是,这与我可能拥有的潜在查询的类型很难组合。 Ideally, I'd like to have an elegant solution which was "plug and play" 理想情况下,我希望有一个优雅的解决方案,即“即插即用”

With both the filter and the classifier systems, it is easy to extend the system with new query elements. 使用过滤器和分类器系统,可以轻松地使用新的查询元素扩展系统。 In the filter example, you add a method to your Query class. 在过滤器示例中,您向Query类添加了一个方法。 In the classifier example, you add a query function builder like the one I wrote for q_merchant_is . 在分类器示例中,添加一个查询函数构建器,如我为q_merchant_is编写的q_merchant_is Usually that involves two lines of python code. 通常这涉及两行python代码。

There is no function or module that will automagically parse your queries the way you want. 没有功能或模块可以自动解析查询。 You'll have to write your own parser and implement the evaluation logic yourself. 您将必须编写自己的解析器并自己实现评估逻辑。

There are modules that can help you parse the query string; 有一些模块可以帮助您解析查询字符串。 for example pyparsing . 例如pyparsing If the query syntax isn't overly complex, you can probably also implement your own parser with simple string operations or perhaps with the regex module . 如果查询语法不是太复杂,您也可以使用简单的字符串操作或regex模块来实现自己的解析器。

Whatever you end up using: Do not use eval . 无论你最终使用: 千万 不能 使用 eval

I really doubt you will find a simple "plug and play" solution here. 我真的怀疑您会在这里找到一个简单的“即插即用”解决方案。 The best you could do would be to implement a proper minilanguage (parser & interpreter) for your queries. 您能做的最好的就是为您的查询实现适当的迷你语言(解析器和解释器)。

The good news is that it might not be that difficult. 好消息是,可能没有那么困难。 If you already have working experience writing parsers and interpreters / compiler then python has no shortage of builtin and 3rd part tools, pick one and go. 如果您已经具有编写解析器和解释器/编译器的工作经验,则python不缺少内置和第3部分工具,请选择其中一种。

Else there's an excellent python tutorial on Ruslan Pivack's blog named "let's build a simple interpreter" that will guide you thru the whole process of creating a simple Pascal parser and interpreter in Python, explaining the terminology etc. 另外,Ruslan Pivack的博客上还有一个很棒的python教程,名为“让我们构建一个简单的解释器” ,它将指导您在Python中创建一个简单的Pascal解析器和解释器的整个过程,并解释术语等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM