简体   繁体   中英

Format a python list and search for patterns

I am getting rows from a spreadsheet with mixtures of numbers, text and dates I want to find elements within the list, some numbers and some text for example

sg = [500782, u'BMOU9015488', u'SD4', u'CLOSED', -1, '', '', -1]
sg = map(str, sg) 
#sg = map(unicode, sg) #option?
if any("-1" in s for s in sg):
    #do something if matched  

I don't feel this is the correct way to do this, I am also trying to match stuff like -1.5 and -1.5C and other unexpected characters like OPEN15 compared to 15

I have also looked at

sg.index("-1")

If positive then its a match (Only good for direct matches)

Some help would be appreciated

If you want to call a function for each case, I would do it this way:

def stub1(elem):
    #do something for match of type '-1'
    return
def stub2(elem):
    #do something for match of type 'SD4'
    return        
def stub3(elem):
    #do something for match of type 'OPEN15'
    return

sg = [500782, u'BMOU9015488', u'SD4', u'CLOSED', -1, '', '', -1]
sg = map(unicode, sg) 
patterns = {u"-1":stub1, u"SD4": stub2, u"OPEN15": stub3} # add more if you want

for elem in sg:
    for k, stub in patterns.iteritems():
        if k in elem:
            stub(elem) 
            break

Where stub1, stub2, ... are the fonctions that contains the code for each case. It will be called (max 1 time per strings) if the string contains a matching substring.

What do you mean by " I don't feel this is the correct way to do this " ? Are you not getting the result you expect ? Is it too slow ?

Maybe, you can organize your data by columns instead of rows and have a more specific filters. If you are looking for speed, I'd suggest using the numpy module which has a very intersting function called select()

Scipy select example

By transforming all your rows in a numpy array, you can test several columns in one pass. This function is amazingly efficient and powerful ! Basically it's used like this:

import numpy as np

a = array(...)
conds = [a < 10, a % 3 == 0, a > 25]
actions = [a + 100, a / 3, a * 10]
result = np.select(conds, actions, default = 0)

All values in a will be transformed as follow:

  • A value 100 will be added to any value of a which is smaller than 10
  • Any value in a which is a multiple of 3 , will be divided by 3
  • Any value above 25 will be multiplied by 10
  • Any other value, not matching the previous conditions, will be set to 0

Bot conds and actions are lists, and must have the same number of arguments. The first element in conds has its action set as the first element of actions .

It could be used to determine the index in a vector for a particular value (eventhough this should be done using the nonzero() numpy function).

a = array(....)
conds = [a <= target, a > target]
actions = [1, 0]
index = select(conds, actions).sum()

This is probably a stupid way of getting an index, but it demonstrates how we can use select() ... and it works :-)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM