简体   繁体   English

使用正则表达式进行Python代码优化

[英]Python code optimization with regexes

I'm building a web app with the purpose to register all the calls made by users. 我正在构建一个Web应用程序,目的是注册用户发出的所有呼叫。 A call can have a service, number called and a cost. 呼叫可以包含服务,被叫号码和费用。 The code provided below is currently working (with the numbers turned into regexes), but it's somewhat of a mess and I would like to optimize it. 下面提供的代码当前正在运行(数字变成了正则表达式),但是有些混乱,我想对其进行优化。 Since this code requires to be exactly in this order to work, when I used a dictionary to store the regex expressions and services, it didn't work. 由于此代码必须完全按此顺序工作,因此当我使用字典来存储正则表达式和服务时,它就无法工作。 Here is my current code with the regex used: 这是我当前使用的正则表达式代码:

  if (service=='R'):
        return "ROAMING"
    elif (service=='V  O') or ((service=='S') and (float(cost)==0.0)):
        return "ONNET"
    elif (service=="") and (nr_called==""):
        return "INTERNET"
    elif (service=='I')or (service=='ROAMING - MMS'):          
        return "OTHERSERV"        
    elif (service=='Internet') or (service=='WAP') or service==('BLACKBERRY.NET') or (service=='ROAMING - INTERNET')or (service=='ROAMING - BLACKBERRY'):
        return "INTERNET"
    elif ((nr_called[:6]=="003516" or nr_called[:6]=="003514" or nr_called[:6]=="003511" or nr_called[:6]=="003517" or nr_called[:6]=="003518") or (nr_called[0]=="6"  or nr_called[0]=="4" or nr_called[0]=="1" or nr_called[0]=="7" or nr_called[0]=="8"))  and (service!="V  O"):
        return "OTHERSERV"    
    elif (service=='Vi F') or (service=='Si'):
        return "INTERNATIONAL"
    elif  (nr_called[0]=="9" or nr_called[:6]=="003519") and (service=="Vp F") and (float(cost)>0) :           
        return "INS"   
    elif (nr_called[:9]=="003519220" or  nr_called[:8]=="00351924" or nr_called[:8]=="00351925" or nr_called[:8]=="00351926" or nr_called[:8]=="00351927" or nr_called[:4]=="9220" or nr_called[:3]=="924" or nr_called[:3]=="925" or nr_called[:3]=="926" or nr_called[:3]=="927") :          
        return "INS"
    elif ((len(nr_called)==9) and nr_called[:2]=="96") or (nr_called[:7]=="0035196"):
        return "INS"
    elif (nr_called[:3]=="921" or  nr_called[:8]=="00351921"):
        return "91"
    elif (nr_called[:3]=="929" or  nr_called[:8]=="00351929"):
        return "93"
    elif (nr_called[:3]=="922" or  nr_called[:8]=="00351922"):
        return "OTHERSERV"
    elif (service!="Vp F") and ((nr_called[:2]=="96" ) or(nr_called[:7]=="0035196")):            
        return "INS"
    elif (len(nr_called)==7) or (service=='V O'):           
        return "ONNET"
    elif ((len(nr_called)==9) and nr_called[:2]=="91") or (nr_called[:7]=="0035191"):
        return "91"
    elif ((len(nr_called)==9) and nr_called[:2]=="93") or (nr_called[:7]=="0035193"):
        return "93"
    elif ((len(nr_called)==9) and nr_called[:1]=="2") or (nr_called[:5]=="00352"):
        return "PT"
    elif float(cost)>0:
        return "OTHERSERV"
    else:
        return "OTHERSERV"

Number regexes: 数字正则表达式:

 OTHERSERV: ['(\+*0*351)?922','(\+*0*351)?[146-8]']
 96:        ['(\+*0*351)?92([4-7|20])','(\+*0*351)?96']
 93:        ['(\+*0*351)?929','(\+*0*351)?93']
 91:        ['(\+*0*351)?921','(\+*0*351)?91']
 PT:        ['(\+*0*351)?2']

I have been dwelling with the optimization for quite a while and can't figure out how can I structure this in an optimized and maintainable way, so any help is very appreciated. 我已经花了很长时间来研究优化问题,却无法弄清楚如何以一种优化且可维护的方式来构造它,因此非常感谢您的帮助。

I'm not sure I understand what you're referring to about the regexes, but one option to clean up the if/elif list would be to use a list of result/check tuples: 我不确定我是否理解您所指的正则表达式,但是清理if / elif列表的一种方法是使用结果/检查元组列表:

actions = [
  ('ROAMING'  , lambda service,nr_called,cost: service=='R'),
  ('ONNET'    , lambda service,nr_called,cost: service in ('V  O','S') and float(cost)==0.0),
  ('INTERNET' , lambda service,nr_called,cost: service=='' and nr_called==''),
  ('OTHERSERV', lambda service,nr_called,cost: service in ('I','ROAMING - MMS'))

  # fallthrough
  ('OTHERSERV', lambda service_nr_called,cost: True)
]

for value,check in actions:
  if check(service,nr_called,cost):
    return value

# fallthrough
return 'OTHERSERV'

One or the other items labeled "fallthrough" would be needed. 将需要标记为“贯穿”的一项或其他项。

I was thinking of a solution which is more completely data/table-driven than @ryachza suggestion, for example: 我在想一个比@ryachza建议更完全由数据/表驱动的解决方案,例如:

import re

def oldcategorize( service, nr_called, cost):
    if (service=='R'):
        return "ROAMING"
    elif (service=='V  O') or ((service=='S') and (float(cost)==0.0)):
        return "ONNET"
    elif (service=="") and (nr_called==""):
        return "INTERNET"
    elif (service=='I')or (service=='ROAMING - MMS'):          
        return "OTHERSERV"        
    elif (service=='Internet') or (service=='WAP') or service==('BLACKBERRY.NET') or (service=='ROAMING - INTERNET')or (service=='ROAMING - BLACKBERRY'):
        return "INTERNET"
    elif ((nr_called[:6]=="003516" or nr_called[:6]=="003514" or nr_called[:6]=="003511" or nr_called[:6]=="003517" or nr_called[:6]=="003518") or (nr_called[0]=="6"  or nr_called[0]=="4" or nr_called[0]=="1" or nr_called[0]=="7" or nr_called[0]=="8"))  and (service!="V  O"):
        return "OTHERSERV"    
    elif (service=='Vi F') or (service=='Si'):
        return "INTERNATIONAL"
    elif  (nr_called[0]=="9" or nr_called[:6]=="003519") and (service=="Vp F") and (float(cost)>0) :           
        return "INS"   
    elif (nr_called[:9]=="003519220" or  nr_called[:8]=="00351924" or nr_called[:8]=="00351925" or nr_called[:8]=="00351926" or nr_called[:8]=="00351927" or nr_called[:4]=="9220" or nr_called[:3]=="924" or nr_called[:3]=="925" or nr_called[:3]=="926" or nr_called[:3]=="927") :          
        return "INS"
    elif ((len(nr_called)==9) and nr_called[:2]=="96") or (nr_called[:7]=="0035196"):
        return "INS"
    elif (nr_called[:3]=="921" or  nr_called[:8]=="00351921"):
        return "91"
    elif (nr_called[:3]=="929" or  nr_called[:8]=="00351929"):
        return "93"
    elif (nr_called[:3]=="922" or  nr_called[:8]=="00351922"):
        return "OTHERSERV"
    elif (service!="Vp F") and ((nr_called[:2]=="96" ) or(nr_called[:7]=="0035196")):            
        return "INS"
    elif (len(nr_called)==7) or (service=='V O'):           
        return "ONNET"
    elif ((len(nr_called)==9) and nr_called[:2]=="91") or (nr_called[:7]=="0035191"):
        return "91"
    elif ((len(nr_called)==9) and nr_called[:2]=="93") or (nr_called[:7]=="0035193"):
        return "93"
    elif ((len(nr_called)==9) and nr_called[:1]=="2") or (nr_called[:5]=="00352"):
        return "PT"
    elif float(cost)>0:
        return "OTHERSERV"
    else:
        return "OTHERSERV"
    return "FAILED"

# table of categories with regex criteria
# special treatment of first character of a pattern "-" means NOT matching
# note cost is coverted to standardized format using str() for matching
#      service                 number       cost   category
categories = [
    #    if (service=='R'):
    #        return "ROAMING"
     ["^R$"                       ,""         ,""         ,"ROAMING"]
    #    elif (service=='V  O') or ((service=='S') and (float(cost)==0.0)):
    #        return "ONNET"
    ,["^V  O$"                    ,""         ,""         ,"ONNET"]
    ,["^S$"                       ,""         ,"^0.0$"    ,"ONNET"]
    # elif (service=="") and (nr_called==""):
    #   return "INTERNET"
    ,["^$"                      ,"^$"       ,""         ,"INTERNET"]
    #    elif (service=='I')or (service=='ROAMING - MMS'):          
    #        return "OTHERSERV"        
    ,["^I$"                       ,""         ,""         ,"OTHERSERV"]
    ,["^ROAMING - MMS$"           ,""         ,""         ,"OTHERSERV"]
    #   elif (service=='Internet') or (service=='WAP') or service==('BLACKBERRY.NET') or (service=='ROAMING - INTERNET')or (service=='ROAMING - BLACKBERRY'):
    #       return "INTERNET"
    ,["^Internet"                ,""         ,""         ,"INTERNET"]
    ,["^WAP"                     ,""         ,""         ,"INTERNET"]
    ,["^BLACKBERRY.NET"          ,""         ,""         ,"INTERNET"]
    ,["^ROAMING - INTERNET"      ,""         ,""         ,"INTERNET"]
    ,["^ROAMING - INTERNET"      ,""         ,""         ,"INTERNET"]
    ,["^ROAMING - INTERNET"      ,""         ,""         ,"INTERNET"]
    ,["^ROAMING - BLACKBERRY"    ,""         ,""         ,"INTERNET"]
    #   elif ((nr_called[:6]=="003516" or nr_called[:6]=="003514" or nr_called[:6]=="003511" or nr_called[:6]=="003517" or nr_called[:6]=="003518") or (nr_called[0]=="6"  or nr_called[0]=="4" or nr_called[0]=="1" or nr_called[0]=="7" or nr_called[0]=="8"))  and (service!="V  O"):
    #       return "OTHERSERV"
    ,["-^V  O"                   ,"^00351[14678]"    ,""         ,"OTHERSERV"]
    ,["-^V  O"                   ,"^[15678]"         ,""         ,"OTHERSERV"]
    #    elif (len(nr_called)==7) or (service=='V O'):           
    #        return "ONNET"
    ,[""                        ,"^.......$"    ,""         ,"ONNET"]
    ,["^V O$"                     ,""             ,""         ,"ONNET"]
    #   ((len(nr_called)==9) and nr_called[:2]=="91") or (nr_called[:7]=="0035191"):
    #       return "91"
    ,[""                        ,"^91.......$"  ,""         ,"91"]
    ,[""                        ,"^0035191"     ,""         ,"91"]
]

def newcategorize(service, nr_called, cost ):
    for servpat,numpat,costpat,res in categories:
        print servpat,numpat,costpat,res
        if (servpat[0]=="-" and not re.match(servpat[1:],service)) or re.match(servpat,service):
            if (numpat[0]=="-" and not re.match(numpat[1:],nr_called)) or re.match(numpat,nr_called):
                if (costpat[0]=="-" and not re.match(costpat[1:],str(cost))) or re.match(costpat,str(cost)):
                    return res
    result = "FAILED to find %s %s %s"%(service,nr_called, cost)
    # print result
    return result

testdata = [
    ["x","012345",0.0]
    ,["S","123",0.0]
    ,["NOT V  O","003517",0.0]
]

for s,n,c in testdata:
    oc = oldcategorize(s,n,c)
    nc = oldcategorize(s,n,c)
    if oc != nc:
        print "ERROR",s,n,c,oc,nc
    print "match",s,n,c,oc,nc

So the table contains all the data needed to drive the logic, but the logic itself is coded in the newcategorize() function, using the power of regex to do most of the hard work, with only the addition that a leading - in a pattern makes the logic look for the pattern NOT matching. 因此,该表包含了驱动逻辑所需的所有数据,但是逻辑本身是在newcategorize()函数中编码的,使用正则表达式的功能来完成大部分艰苦的工作,仅需在模式中加上前导使逻辑查找不匹配的模式。 Using this approach the table is more concise, whereas with @ryachza solution the table could get quite difficult to read and probably debug for the more complex lambda expressions. 使用这种方法,表更加简洁,而使用@ryachza解决方案,表可能会变得很难阅读,并且可能会调试更复杂的lambda表达式。 With my approach you just add print statements. 使用我的方法,您只需添加打印语句。

HTH Barny HTH Barny

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM