简体   繁体   中英

Python split list values based on condition

Given a python list split values based on certain criteria:

    list = ['(( value(name) = literal(luke) or value(like) = literal(music) ) 
     and (value(PRICELIST) in propval(valid))',
    '(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
     (value(PRICELIST) in propval(valid))'] 

Now list[0] would be

  (( value(name) = literal(luke) or value(like) = literal(music) ) 
     and (value(PRICELIST) in propval(valid))

I want to split such that upon iterating it would give me:

#expected output
value(sam) = literal(abc)
value(like) = literal(music)

That too if it starts with value and literal. At first I thought of splitting with and ,or but it won't work as sometimes there could be missing and ,or.

I tried :

for i in list:
i.split()
print(i)
#output ['((', 'value(abc)', '=', 'literal(12)', 'or' .... 

I am open to suggestions based on regex also. But I have little idea about it I prefer not to include it

@Duck_dragon

Your strings in your list in the opening post were formatted in such a way that they cause a syntax error in Python. In the example I give below, I edited it to use '''

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
 and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
 (value(PRICELIST) in propval(valid))''']


#Simple findall without setting it equal to a variable so it returns a list of separate strings but which you can't use
#You can also use the *MORE SIMPLE* but less flexible regex:  '([a-zA-Z]+\([a-zA-Z]+\)[\s=]+[a-zA-Z]+\([a-zA-Z]+\))'
>>> for item in list:
        re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item)    

    ['value(name) = literal(luke)', 'value(like) = literal(music)']
    ['value(sam) = literal(abc)', 'value(like) = literal(music)']

.

To take this a step further and give you an array you can work with:

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
 and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
 (value(PRICELIST) in propval(valid))''']


#Declaring blank array found_list which you can use to call the individual items
>>> found_list = []
>>> for item in list:
        for element in re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item):
            found_list.append(element)


>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(sam) = literal(abc)', 'value(like) = literal(music)']

.

Given your comment below which I couldn't quite understand, is this what you want? I changed the list to add in the other values you mentioned:

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
(value(PRICELIST) in propval(valid))''',
'''(value(PICK_SKU1) = propval(._sku)''', '''propval(._amEntitled) > literal(0))''']


>>> found_list = []
>>> for item in list:
        for element in re.findall('([\w\.]+(?:\()[\w\.]+(?:\))[\s=<>(?:in)]+[\w\.]+(?:\()[\w\.]+(?:\)))', item):
            found_list.append(element)

>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(PICK_SKU1) = propval(._sku)', 'propval(._amEntitled) > literal(0)']

.

Edit: Or is this what you want?

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
 and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
 (value(PRICELIST) in propval(valid))''']


#Declaring blank array found_list which you can use to call the individual items
>>> found_list = []
>>> for item in list:
        for element in re.findall('([a-zA-Z]+(?:\()[a-zA-Z]+(?:\))[\s=<>(?:in)]+[a-zA-Z]+(?:\()[a-zA-Z]+(?:\)))', item):
            found_list.append(element)


>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)']

Let me know if you need an explanation.

.

@Fyodor Kutsepin

In your example take out your_list_ and replace it with OP's list to avoid confusion. Secondly, your for loop lacks a : producing syntax errors

So to avoid so much clutter, I'm going to explain the solution in this comment. I hope that's okay.

Given your comment above which I couldn't quite understand, is this what you want? I changed the list to add in the other values you mentioned:

>>> import re
>>> list = ['''(( value(name) = literal(luke) or value(like) = literal(music) ) 
and (value(PRICELIST) in propval(valid))''',
'''(( value(sam) = literal(abc) or value(like) = literal(music) ) and 
(value(PRICELIST) in propval(valid))''',
'''(value(PICK_SKU1) = propval(._sku)''', '''propval(._amEntitled) > literal(0))''']


>>> found_list = []
>>> for item in list:
        for element in re.findall('([\w\.]+(?:\()[\w\.]+(?:\))[\s=<>(?:in)]+[\w\.]+(?:\()[\w\.]+(?:\)))', item):
            found_list.append(element)

>>> found_list
['value(name) = literal(luke)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(sam) = literal(abc)', 'value(like) = literal(music)', 'value(PRICELIST) in propval(valid)', 'value(PICK_SKU1) = propval(._sku)', 'propval(._amEntitled) > literal(0)']

Explanation:

  • Pre-Note - I changed [a-zA-Z0-9\\._]+ to [\\w\\.]+ because they mean essentially the same thing but one is more concise. I explain what characters are covered by those queries in the next step
  • With ([\\w\\.]+ , noting that it is "unclosed" meaning I am priming the regex to capture everything in the following query, I am telling it to begin by capturing all characters that are in the range az , AZ , and _ , and an escaped period ( . )
  • With (?:\\() I am saying the captured query should contain an escaped "opening" parenthesis ( ( )
  • With [\\w\\.]+(?:\\)) I'm saying follow that parenthesie again with the word characters outlined in the second step, but this time through (?:\\)) I'm saying follow them with an escaped "closing" parenthesis ( ) )
  • This [\\s=<>(?:in)]+ is kind of reckless but for the sake of readability and assuming that your strings will remain relatively consistent this says, that the "closing parenthesis" should be followed by "whitespace" , a = , a < , a > , or the word in , in any order however many times they all occur consistently. It is reckless because it will also match things like << < , = in > = , etc. To make it more specific could easily result in a loss of captures though
  • With [\\w\\.]+(?:\\()[\\w\\.]+(?:\\)) I'm saying once again, find the word characters from step 1, followed by an "opening parenthesis," followed again by the word characters, followed by a "closing parenthesis"
  • With the ) I am closing the "unclosed" capture group (remember the first capture group above started as "unclosed"), to tell the regex engine to capture the entire query I have outlined

Hope this helps

First, I would suggest you to avoid of naming your variables like build-in functions. Second, you don't need a regex if you want to get the mentioned output.

for example:

first, rest = your_list_[1].split(') and'):
for item in first[2:].split('or')
    print(item)

Not saying you should but you definately could use a PEG parser here:

from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

data = ['(( value(name) = literal(luke) or value(like) = literal(music) ) and (value(PRICELIST) in propval(valid))',
        '(( value(sam) = literal(abc) or value(like) = literal(music) ) and (value(PRICELIST) in propval(valid))']

grammar = Grammar(
    r"""
    expr        = term (operator term)*
    term        = lpar* factor (operator needle)* rpar*
    factor      = needle operator needle

    needle      = word lpar word rpar

    operator    = ws? ("=" / "or" / "and" / "in") ws?
    word        = ~"\w+"

    lpar        = "(" ws?
    rpar        = ws? ")"
    ws          = ~r"\s*"
    """
)

class HorribleStuff(NodeVisitor):
    def generic_visit(self, node, visited_children):
        return node.text or visited_children

    def visit_factor(self, node, children):
        output, equal = [], False

        for child in node.children:
            if (child.expr.name == 'needle'):
                output.append(child.text)
            elif (child.expr.name == 'operator' and child.text.strip() == '='):
                equal = True

        if equal:
            print(output)

for d in data:
    tree = grammar.parse(d)
    hs = HorribleStuff()
    hs.visit(tree)

This yields

['value(name)', 'literal(luke)']
['value(sam)', 'literal(abc)']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM