简体   繁体   English

格式化python列表并搜索模式

[英]Format a python list and search for patterns

I am getting rows from a spreadsheet with mixtures of numbers, text and dates I want to find elements within the list, some numbers and some text for example 我从电子表格中获取行,其中包含数字,文本和日期,我想在列表中查找元素,例如一些数字和一些文本

sg = [500782, u'BMOU9015488', u'SD4', u'CLOSED', -1, '', '', -1]
sg = map(str, sg) 
#sg = map(unicode, sg) #option?
if any("-1" in s for s in sg):
    #do something if matched  

I don't feel this is the correct way to do this, I am also trying to match stuff like -1.5 and -1.5C and other unexpected characters like OPEN15 compared to 15 我觉得这不是正确的方法,我也在尝试匹配-1.5和-1.5C之类的东西以及其他非预期的字符,例如OPEN15与15

I have also looked at 我也看过

sg.index("-1")

If positive then its a match (Only good for direct matches) 如果为正则表示匹配(仅适用于直接匹配)

Some help would be appreciated 一些帮助将不胜感激

If you want to call a function for each case, I would do it this way: 如果您想为每种情况调用一个函数,我可以这样做:

def stub1(elem):
    #do something for match of type '-1'
    return
def stub2(elem):
    #do something for match of type 'SD4'
    return        
def stub3(elem):
    #do something for match of type 'OPEN15'
    return

sg = [500782, u'BMOU9015488', u'SD4', u'CLOSED', -1, '', '', -1]
sg = map(unicode, sg) 
patterns = {u"-1":stub1, u"SD4": stub2, u"OPEN15": stub3} # add more if you want

for elem in sg:
    for k, stub in patterns.iteritems():
        if k in elem:
            stub(elem) 
            break

Where stub1, stub2, ... are the fonctions that contains the code for each case. 其中stub1,stub2,...是包含每种情况的代码的功能。 It will be called (max 1 time per strings) if the string contains a matching substring. 如果字符串包含匹配的子字符串,它将被调用(每个字符串最多1次)。

What do you mean by " I don't feel this is the correct way to do this " ? 我不认为这是正确的做法 ”是什么意思? Are you not getting the result you expect ? 您没有得到预期的结果吗? Is it too slow ? 太慢了吗?

Maybe, you can organize your data by columns instead of rows and have a more specific filters. 也许,您可以按列而不是行来组织数据,并具有更具体的过滤器。 If you are looking for speed, I'd suggest using the numpy module which has a very intersting function called select() 如果您正在寻找速度,我建议您使用numpy模块,该模块具有一个非常有趣的功能,称为select()

Scipy select example Scipy选择示例

By transforming all your rows in a numpy array, you can test several columns in one pass. 通过将所有行转换为numpy数组,您可以一次通过测试多个列。 This function is amazingly efficient and powerful ! 此功能非常有效且强大! Basically it's used like this: 基本上是这样使用的:

import numpy as np

a = array(...)
conds = [a < 10, a % 3 == 0, a > 25]
actions = [a + 100, a / 3, a * 10]
result = np.select(conds, actions, default = 0)

All values in a will be transformed as follow: 所有的值将被转换如下:

  • A value 100 will be added to any value of a which is smaller than 10 的值100将被添加到任何值,该值小于10
  • Any value in a which is a multiple of 3 , will be divided by 3 这是的倍数任何值3 ,将由分3
  • Any value above 25 will be multiplied by 10 大于25任何值都将乘以10
  • Any other value, not matching the previous conditions, will be set to 0 不符合先前条件的任何其他值将被设置为0

Bot conds and actions are lists, and must have the same number of arguments. 博特conds行动列表,并且必须有相同数量的参数。 The first element in conds has its action set as the first element of actions . conds的第一个元素都有其行动设定行动的第一要素。

It could be used to determine the index in a vector for a particular value (eventhough this should be done using the nonzero() numpy function). 它可以用来确定向量中特定值的索引(尽管这应该使用nonzero() numpy函数完成)。

a = array(....)
conds = [a <= target, a > target]
actions = [1, 0]
index = select(conds, actions).sum()

This is probably a stupid way of getting an index, but it demonstrates how we can use select() ... and it works :-) 这可能是获取索引的一种愚蠢的方法,但是它演示了我们如何使用select() ...并且它有效:-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM