简体   繁体   English

重构python函数的最佳方法

[英]Best approach to refactoring a python function

I have a messy function I am working to refactor to be more effecient and readable.我有一个凌乱的功能,我正在努力重构以提高效率和可读性。 My python skill are beginner to intermediate at best and I imagine there is a much cleaner way to accomplish this task.我的 Python 技能充其量是初级到中级,我想有一种更简洁的方法来完成这项任务。

The function below takes in a string that has various business contact related information in it.下面的函数接受一个字符串,其中包含各种与业务联系相关的信息。 The information is separated by colons.信息由冒号分隔。 The business name is always the first field so it can be extracted easy but the rest of the "columns (data between the colons) may or may not be included and is not always in the same order.企业名称始终是第一个字段,因此可以轻松提取,但其余的“列(冒号之间的数据)可能包含也可能不包含,并且顺序并不总是相同。

The function takes two parameters, 1) rowdata (string containing the examples below) and 2) the data element that I am looking to get returned.该函数采用两个参数,1)rowdata(包含以下示例的字符串)和 2)我希望返回的数据元素。

# Business Contact Information
def parseBusinessContactInformation(self,rowdata,element):

    ## Process Business Contact Information
    ## example rowdata = "Business Name, LLC : Business DBA : Email- person@email.com : Phone- 1234567890 : Website- www.site.com"
    ## example rowdata = "Business Name, LLC : Email- person@email.com : Phone- 1234567890 : Website- www.site.com"
    ## example rowdata = "Business Name, LLC : Business DBA : Phone- 1234567890 : Website- www.site.com"
    ## example rowdata = "Business Name, LLC : Phone- 1234567890"
  
    businessName = None
    businessDba = None
    businessPhone = None
    businessEmail = None
    businessWebsite = None
    
    # Split rowdata on :
    contactData = rowdata.split(':')

    ## [0] - business name should always be present
    businessName = contactData[0].strip()
    
    ## [1] - doing_business_as or another field if not present
    if 1 < len(contactData) and re.search('email',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessEmail = contactTemp[1].strip()
        businessDba = contactData[0].strip()
    elif 1 < len(contactData) and re.search('phone',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessPhone = contactTemp[1].strip()
        businessDba = contactData[0].strip()
    elif 1 < len(contactData) and re.search('website',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessWebsite = contactTemp[1].strip()
        businessDba = contactData[0].strip()
    elif 1 < len(contactData) and not re.search(r'(phone|email|website)',contactData[1].lower()):
        businessDba = contactData[1].strip()
    else:
        businessDba = self.dataNotAvailableMessage
    
    ## [2] - phone or email or website
    if 2 < len(contactData) and re.search('email',contactData[2].lower()):
        contactTemp = contactData[2].split('-')
        businessEmail = contactTemp[1].strip()
    elif 2 < len(contactData) and re.search('phone',contactData[2].lower()):
        contactTemp = contactData[2].split('-')
        businessPhone = contactTemp[1].strip()
    elif 2 < len(contactData) and re.search('website',contactData[2].lower()):
        contactTemp = contactData[2].split('-')
        businessWebsite = contactTemp[1].strip()
    
    ## [3] - phone or email or website
    if 3 < len(contactData) and re.search('email',contactData[3].lower()):
        contactTemp = contactData[3].split('-')
        businessEmail = contactTemp[1].strip()
    elif 3 < len(contactData) and re.search('phone',contactData[3].lower()):
        contactTemp = contactData[3].split('-')
        businessPhone = contactTemp[1].strip()
    elif 3 < len(contactData) and re.search('website',contactData[3].lower()):
        contactTemp = contactData[3].split('-')
        businessWebsite = contactTemp[1].strip()
    
    if element == "businessName":
        return businessName
    elif element == "businessDba":
        return businessDba
    elif element == "businessPhone":
        return businessPhone
    elif element == "businessEmail":
        return businessEmail
    elif element == "businessWebsite":
        return businessWebsite
    else:
        return self.dataNotAvailableMessage

I am trying to understand a better way to do this.我试图了解一种更好的方法来做到这一点。

Refactoring is a process that is cumulative.重构是一个累积的过程。 You have a comprehensive description of the method in Refactoring by Martin Fowler and Kent Beck.您在 Martin Fowler 和 Kent Beck 所著的Refactoring中对该方法进行了全面的描述。

Its heart is a series of small behavior preserving transformations.它的核心是一系列小的行为保留转换。 (Martin Fowler, https://refactoring.com/ ) (马丁福勒, https://refactoring.com/

The most important part is: "small" and "behavior preserving".最重要的部分是:“小”和“行为保持”。 The word "small" is self explanatory, but "behavior preserving" should be ensured by unit tests. “小”这个词是不言自明的,但“行为保持”应该通过单元测试来确保。

Preliminary remark: I suggest you stick with PEP 8 Style Guide .初步评论:我建议您坚持使用PEP 8 Style Guide

Behavior preserving行为保护

Replace your comment by a docstring ( https://www.python.org/dev/peps/pep-0008/#id33 ).将您的评论替换为文档字符串 ( https://www.python.org/dev/peps/pep-0008/#id33 )。 This is very useful because you write some unit tests inside the docstring (aka doctests ).这非常有用,因为您在 docstring(又名doctests )中编写了一些单元测试。

class MyParser:
    dataNotAvailableMessage = "dataNotAvailableMessage"

    # Business Contact Information
    def parseBusinessContactInformation(self,rowdata,element):
        """Process Business Contact Information
        
        Examples:
        >>> p = MyParser()
        >>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessPhone")
        '1234567890'
        
        >>> p.parseBusinessContactInformation("Business Name, LLC : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessName")
        'Business Name, LLC'
        
        >>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Phone- 1234567890 : Website- www.site.com", "businessDba")
        'Business DBA'
        
        >>> p.parseBusinessContactInformation("Business Name, LLC : Phone- 1234567890", "businessEmail") is None
        True
        
        >>> p.parseBusinessContactInformation("Business Name, LLC : Phone- 1234567890", "?") 
        'dataNotAvailableMessage'
        
        """

        ...
        
import doctest
doctest.testmod()            
  

You should write more unit tests (use https://docs.python.org/3/library/unittest.html to avoid flooding the docstring) to secure the current behavior, but that's a good start.您应该编写更多单元测试(使用https://docs.python.org/3/library/unittest.html以避免淹没文档字符串)以保护当前行为,但这是一个好的开始。

Now, a small transformation: look at those (el)if 1 < len(contactData) and ... lines.现在,一个小的转换:看看那些(el)if 1 < len(contactData) and ...行。 You can test the length just once:您可以只测试一次长度:

if 1 < len(contactData):
    if re.search('email',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessEmail = contactTemp[1].strip()
        businessDba = contactData[0].strip()
    elif re.search('phone',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessPhone = contactTemp[1].strip()
        businessDba = contactData[0].strip()
    elif re.search('website',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessWebsite = contactTemp[1].strip()
        businessDba = contactData[0].strip()
    elif not re.search(r'(phone|email|website)',contactData[1].lower()):
        businessDba = contactData[1].strip()
    else:
        businessDba = self.dataNotAvailableMessage
else:
    businessDba = self.dataNotAvailableMessage

You notice that the penultimate else is not reachable: either you have phone , email , website or not:您注意到倒数第二个else无法访问:您是否有phoneemailwebsite

if 1 < len(contactData):
    if re.search('email',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessEmail = contactTemp[1].strip()
        businessDba = contactData[0].strip()
    elif re.search('phone',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessPhone = contactTemp[1].strip()
        businessDba = contactData[0].strip()
    elif re.search('website',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessWebsite = contactTemp[1].strip()
        businessDba = contactData[0].strip()
    else:
        businessDba = contactData[1].strip()
else:
    businessDba = self.dataNotAvailableMessage

Do the same for [2] and [3]:对 [2] 和 [3] 执行相同操作:

if 2 < len(contactData):
    if re.search('email',contactData[2].lower()):
        contactTemp = contactData[2].split('-')
        businessEmail = contactTemp[1].strip()
    elif re.search('phone',contactData[2].lower()):
        contactTemp = contactData[2].split('-')
        businessPhone = contactTemp[1].strip()
    elif re.search('website',contactData[2].lower()):
        contactTemp = contactData[2].split('-')
        businessWebsite = contactTemp[1].strip()
    
if 3 < len(contactData):
    if re.search('email',contactData[3].lower()):
        contactTemp = contactData[3].split('-')
        businessEmail = contactTemp[1].strip()
    elif re.search('phone',contactData[3].lower()):
        contactTemp = contactData[3].split('-')
        businessPhone = contactTemp[1].strip()
    elif re.search('website',contactData[3].lower()):
        contactTemp = contactData[3].split('-')
        businessWebsite = contactTemp[1].strip()

Now you see a clear pattern.现在你看到了一个清晰的模式。 Except the first part assignements businessDba , you have clearly three times the same process.除了第一部分分配businessDba ,您显然有三倍相同的过程。 First, we isolate the assignement of businessDba in the first part:首先,我们隔离第一部分businessDba的赋值:

if 1 < len(contactData):
    if re.search('(email|phone|website)',contactData[1].lower()):
        businessDba = contactData[0].strip()
    else:
        businessDba = contactData[1].strip()
else:
    businessDba = self.dataNotAvailableMessage

And then:进而:

if 1 < len(contactData):
    if re.search('email',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessEmail = contactTemp[1].strip()
    elif re.search('phone',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessPhone = contactTemp[1].strip()
    elif re.search('website',contactData[1].lower()):
        contactTemp = contactData[1].split('-')
        businessWebsite = contactTemp[1].strip()

Before we go further, we can remove the line在我们进一步之前,我们可以删除该行

businessName = None
businessDba = None

Since businessName and businessDba have always a value.由于businessNamebusinessDba始终具有值。 And replace the new line:并替换新行:

businessDba = contactData[0].strip()

By:经过:

businessDba = businessName

That makes explicit the fallback.这明确了后备。

Now, we have three times the same process.现在,我们有三倍相同的过程。 A loop is a good idea:循环是个好主意:

for i in range(1, 3):
    if i >= len(contactData):
        break
        
    if re.search('email',contactData[i].lower()):
        contactTemp = contactData[i].split('-')
        businessEmail = contactTemp[1].strip()
    elif re.search('phone',contactData[i].lower()):
        contactTemp = contactData[i].split('-')
        businessPhone = contactTemp[1].strip()
    elif re.search('website',contactData[i].lower()):
        contactTemp = contactData[i].split('-')
        businessWebsite = contactTemp[1].strip()

We can extract contactTemp = , even if it will not always be useful:我们可以提取contactTemp = ,即使它并不总是有用:

for i in range(1, 3):
    if i >= len(contactData):
        break
    contactTemp = contactData[i].split('-')
        
    if re.search('email',contactData[i].lower()):
        businessEmail = contactTemp[1].strip()
    elif re.search('phone',contactData[i].lower()):
        businessPhone = contactTemp[1].strip()
    elif re.search('website',contactData[i].lower()):
        businessWebsite = contactTemp[1].strip()

That's better, but I find the last part ( if element == ... ) really cumbersome: you test the element against all possibilities.那更好,但我发现最后一部分( if element == ... )真的很麻烦:您针对所有可能性测试element One would like a dictionary here.这里有人想要一本字典。 For a small transformation, we can write:对于一个小的转换,我们可以写:

d = {
    "businessName": businessName,
    "businessDba": businessDba,
    "businessPhone": businessPhone,
    "businessEmail": businessEmail,
    "businessWebsite": businessWebsite
}
return d.get(element, self.dataNotAvailableMessage)

Now, instead of initializing the dict at the end, we can create it and update it on the fly:现在,我们可以创建它并动态更新它,而不是在最后初始化 dict:

    d = {
        "businessPhone": None,
        "businessEmail": None,
        "businessWebsite": None
    }
    
    # Split rowdata on :
    contactData = rowdata.split(':')

    ## [0] - business name should always be present
    d["businessName"] = contactData[0].strip()

    if 1 < len(contactData):
        if re.search('(email|phone|website)',contactData[1].lower()):
            d["businessDba"] = d["businessName"]
        else:
            d["businessDba"] = contactData[1].strip()
    else:
        d["businessDba"] = self.dataNotAvailableMessage

    for i in range(1, 4):
        if i >= len(contactData):
            break
            
        contactTemp = contactData[i].split('-')
        if re.search('email',contactData[i].lower()):
            d["businessEmail"] = contactTemp[1].strip()
        elif re.search('phone',contactData[i].lower()):
            d["businessPhone" = contactTemp[1].strip()
        elif re.search('website',contactData[i].lower()):
            d["businessWebsite"] = contactTemp[1].strip()
    
    return d.get(element, self.dataNotAvailableMessage)

I ran the tests on every modification and it still works, but it is not so easy to read.我对每次修改都进行了测试,它仍然有效,但它并不那么容易阅读。 We can extract a function that creates the dict:我们可以提取一个创建字典的函数:

def parseBusinessContactInformation(self, rowdata, element):
    d = self._parseBusinessContactInformation(rowdata)
    return d.get(element, self.dataNotAvailableMessage)

def _parseBusinessContactInformation(self, rowdata):
    ...

With a small behavior change有一个小的行为改变

That's not bad, but we can improve this with a small behavior change (I hope you will be okay with this new behavior!):这还不错,但我们可以通过一个小的行为改变来改善这一点(我希望你能接受这个新行为!):

    for i in range(1, 4):
        if i >= len(contactData):
            break
            
        contactTemp = contactData[i].split('-')
        if len(contactTemp) > 1:
            d["business" + contactTemp[0].strip()] = contactTemp[1].strip()
        

What is the behavior change?什么是行为改变? Simply, we now accept something like简单地说,我们现在接受类似

>>> p = MyParser()
>>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Foo- Bar", "businessFoo")
'Bar'

Since we accept more element s, we should change the loop range :由于我们接受更多的element ,我们应该改变循环range

    for i in range(1, len(contactData)):
        ...

It is time to focus on a slight inconsistance: why can businessDba have the value self.dataNotAvailableMessage that was created for the case of a non existing element?是时候关注一个轻微的不一致了:为什么businessDba可以具有为不存在元素的情况创建的值self.dataNotAvailableMessage We should use None :我们应该使用None

    d = {
        "businessDba": None,
        ...
    }

and remove those two lines:并删除这两行:

    else:
        d["businessDba"] = self.dataNotAvailableMessage

Then this can be simplified:那么这可以简化:

    if 1 < len(contactData):
        if "-" in contactData[1]:
            d["businessDba"] = d["businessName"]
        else:
            d["businessDba"] = contactData[1].strip()

Here's the code:这是代码:

def parseBusinessContactInformation(self,rowdata,element):
    """Process Business Contact Information
    
    Examples:
    >>> p = MyParser()
    >>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessPhone")
    '1234567890'
    
    >>> p.parseBusinessContactInformation("Business Name, LLC : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessName")
    'Business Name, LLC'
    
    >>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Phone- 1234567890 : Website- www.site.com", "businessDba")
    'Business DBA'
    
    >>> p.parseBusinessContactInformation("Business Name, LLC : Phone- 1234567890", "businessEmail") is None
    True
    
    >>> p.parseBusinessContactInformation("Business Name, LLC : Phone- 1234567890", "?") 
    'dataNotAvailableMessage'
    
    >>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Foo- Bar", "businessFoo")
    'Bar'
    
    """
    d = self._parseBusinessContactInformation(rowdata)
    return d.get(element, self.dataNotAvailableMessage)
  
def _parseBusinessContactInformation(self,rowdata):
    d = {
        "businessDba": None,
        "businessPhone": None,
        "businessEmail": None,
        "businessWebsite": None
    }
    
    # Split rowdata on :
    contactData = rowdata.split(':')

    ## [0] - business name should always be present
    d["businessName"] = contactData[0].strip()

    if 1 < len(contactData):
        if "-" in contactData[1]:
            d["businessDba"] = d["businessName"]
        else:
            d["businessDba"] = contactData[1].strip()

    for i in range(1, len(contactData)):
        contactTemp = contactData[i].split('-')
        if len(contactTemp) > 1:
            d["business" + contactTemp[0].strip()] = contactTemp[1].strip()

    return d

The final touch: switch to snake case, make a get and a parse function: parse returns a dict while get returns a value:最后一步:切换到蛇形案例,创建一个get和一个parse函数: parse返回一个 dict 而get返回一个值:

data_not_available_message = "dataNotAvailableMessage"

def get_business_contact_information(self, rowdata, element):
    """Process Business Contact Information
    
    Examples:
    >>> p = MyParser()
    >>> p.get_business_contact_information("Business Name, LLC : Business DBA : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessPhone")
    '1234567890'
    
    >>> p.get_business_contact_information("Business Name, LLC : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessName")
    'Business Name, LLC'
    
    >>> p.get_business_contact_information("Business Name, LLC : Business DBA : Phone- 1234567890 : Website- www.site.com", "businessDba")
    'Business DBA'
    
    >>> p.get_business_contact_information("Business Name, LLC : Phone- 1234567890", "businessEmail") is None
    True
    
    >>> p.get_business_contact_information("Business Name, LLC : Phone- 1234567890", "?") 
    'dataNotAvailableMessage'
    
    >>> p.get_business_contact_information("Business Name, LLC : Business DBA : Foo- Bar", "businessFoo")
    'Bar'
    
    :param rowdata: ...
    :param element: ...
    :return: ...
    """
    d = self._parse_business_contact_information(rowdata)
    return d.get(element, self.data_not_available_message)

With some cosmetic changes to make it more pythonic:进行一些外观更改以使其更加 Pythonic:

def parse_business_contact_information(self, rowdata):
    """Process Business Contact Information
    
    Examples:
    >>> p = MyParser()
    >>> p.parse_business_contact_information("Business Name, LLC : Business DBA : Email- person@email.com : Phone- 1234567890 : Website- www.site.com") == {
    ... 'businessDba': 'Business DBA', 'businessPhone': '1234567890', 'businessEmail': 'person@email.com', 
    ... 'businessWebsite': 'www.site.com', 'businessName': 'Business Name, LLC'}        
    True

    >>> p.parse_business_contact_information("Business Name, LLC : Phone- 1234567890") == {
    ... 'businessDba': 'Business Name, LLC', 'businessPhone': '1234567890', 'businessEmail': None, 
    ... 'businessWebsite': None, 'businessName': 'Business Name, LLC'}
    True
    
    :param rowdata: ...
    :return: ...
    """
    d = dict.fromkeys(("businessDba", "businessPhone", 
                       "businessEmail", "businessWebsite"))
    
    name, *others = rowdata.split(':') # destructuring assignment

    d["businessName"] = name.strip()
    if not others:
        return d
    
    if "-" in others[0]:
        d["businessDba"] = d["businessName"]
    else:
        d["businessDba"] = others[0].strip()
        others.pop(0) # consume others[0]

    for data in others:
        try:
            key, value = data.split('-', 1) # a- b-c => a, b-c
        except ValueError: # too many/not enough values to unpack
            print("Element {} should have a dash".format(data))
        else:
            d["business" + key.strip()] = value.strip()

    return d

The code is not perfect, but it is clearer than it was, at least to my eyes.代码并不完美,但比以前更清晰,至少在我看来是这样。

To summarize the method:总结方法:

  1. write unit tests to secure the behavior;编写单元测试以保护行为;
  2. make small transformations that preserve the behavior and improve the readabilty.进行小的转换以保留行为提高可读性。 Factorize what you can and don't focus on performance here;在这里分解你可以和不要关注性能的因素;
  3. continue until you have something clear / stop when you go around in circles and make unnecessary modifications;继续,直到你有一些清楚/停止,当你绕圈子并进行不必要的修改时;
  4. if necessary, improve performance.如有必要,提高性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM