[英]Best approach to refactoring a python function
I have a messy function I am working to refactor to be more effecient and readable.我有一个凌乱的功能,我正在努力重构以提高效率和可读性。 My python skill are beginner to intermediate at best and I imagine there is a much cleaner way to accomplish this task.
我的 Python 技能充其量是初级到中级,我想有一种更简洁的方法来完成这项任务。
The function below takes in a string that has various business contact related information in it.下面的函数接受一个字符串,其中包含各种与业务联系相关的信息。 The information is separated by colons.
信息由冒号分隔。 The business name is always the first field so it can be extracted easy but the rest of the "columns (data between the colons) may or may not be included and is not always in the same order.
企业名称始终是第一个字段,因此可以轻松提取,但其余的“列(冒号之间的数据)可能包含也可能不包含,并且顺序并不总是相同。
The function takes two parameters, 1) rowdata (string containing the examples below) and 2) the data element that I am looking to get returned.该函数采用两个参数,1)rowdata(包含以下示例的字符串)和 2)我希望返回的数据元素。
# Business Contact Information
def parseBusinessContactInformation(self,rowdata,element):
## Process Business Contact Information
## example rowdata = "Business Name, LLC : Business DBA : Email- person@email.com : Phone- 1234567890 : Website- www.site.com"
## example rowdata = "Business Name, LLC : Email- person@email.com : Phone- 1234567890 : Website- www.site.com"
## example rowdata = "Business Name, LLC : Business DBA : Phone- 1234567890 : Website- www.site.com"
## example rowdata = "Business Name, LLC : Phone- 1234567890"
businessName = None
businessDba = None
businessPhone = None
businessEmail = None
businessWebsite = None
# Split rowdata on :
contactData = rowdata.split(':')
## [0] - business name should always be present
businessName = contactData[0].strip()
## [1] - doing_business_as or another field if not present
if 1 < len(contactData) and re.search('email',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessEmail = contactTemp[1].strip()
businessDba = contactData[0].strip()
elif 1 < len(contactData) and re.search('phone',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessPhone = contactTemp[1].strip()
businessDba = contactData[0].strip()
elif 1 < len(contactData) and re.search('website',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessWebsite = contactTemp[1].strip()
businessDba = contactData[0].strip()
elif 1 < len(contactData) and not re.search(r'(phone|email|website)',contactData[1].lower()):
businessDba = contactData[1].strip()
else:
businessDba = self.dataNotAvailableMessage
## [2] - phone or email or website
if 2 < len(contactData) and re.search('email',contactData[2].lower()):
contactTemp = contactData[2].split('-')
businessEmail = contactTemp[1].strip()
elif 2 < len(contactData) and re.search('phone',contactData[2].lower()):
contactTemp = contactData[2].split('-')
businessPhone = contactTemp[1].strip()
elif 2 < len(contactData) and re.search('website',contactData[2].lower()):
contactTemp = contactData[2].split('-')
businessWebsite = contactTemp[1].strip()
## [3] - phone or email or website
if 3 < len(contactData) and re.search('email',contactData[3].lower()):
contactTemp = contactData[3].split('-')
businessEmail = contactTemp[1].strip()
elif 3 < len(contactData) and re.search('phone',contactData[3].lower()):
contactTemp = contactData[3].split('-')
businessPhone = contactTemp[1].strip()
elif 3 < len(contactData) and re.search('website',contactData[3].lower()):
contactTemp = contactData[3].split('-')
businessWebsite = contactTemp[1].strip()
if element == "businessName":
return businessName
elif element == "businessDba":
return businessDba
elif element == "businessPhone":
return businessPhone
elif element == "businessEmail":
return businessEmail
elif element == "businessWebsite":
return businessWebsite
else:
return self.dataNotAvailableMessage
I am trying to understand a better way to do this.我试图了解一种更好的方法来做到这一点。
Refactoring is a process that is cumulative.重构是一个累积的过程。 You have a comprehensive description of the method in Refactoring by Martin Fowler and Kent Beck.
您在 Martin Fowler 和 Kent Beck 所著的Refactoring中对该方法进行了全面的描述。
Its heart is a series of small behavior preserving transformations.
它的核心是一系列小的行为保留转换。 (Martin Fowler, https://refactoring.com/ )
(马丁福勒, https://refactoring.com/ )
The most important part is: "small" and "behavior preserving".最重要的部分是:“小”和“行为保持”。 The word "small" is self explanatory, but "behavior preserving" should be ensured by unit tests.
“小”这个词是不言自明的,但“行为保持”应该通过单元测试来确保。
Preliminary remark: I suggest you stick with PEP 8 Style Guide .初步评论:我建议您坚持使用PEP 8 Style Guide 。
Replace your comment by a docstring ( https://www.python.org/dev/peps/pep-0008/#id33 ).将您的评论替换为文档字符串 ( https://www.python.org/dev/peps/pep-0008/#id33 )。 This is very useful because you write some unit tests inside the docstring (aka doctests ).
这非常有用,因为您在 docstring(又名doctests )中编写了一些单元测试。
class MyParser:
dataNotAvailableMessage = "dataNotAvailableMessage"
# Business Contact Information
def parseBusinessContactInformation(self,rowdata,element):
"""Process Business Contact Information
Examples:
>>> p = MyParser()
>>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessPhone")
'1234567890'
>>> p.parseBusinessContactInformation("Business Name, LLC : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessName")
'Business Name, LLC'
>>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Phone- 1234567890 : Website- www.site.com", "businessDba")
'Business DBA'
>>> p.parseBusinessContactInformation("Business Name, LLC : Phone- 1234567890", "businessEmail") is None
True
>>> p.parseBusinessContactInformation("Business Name, LLC : Phone- 1234567890", "?")
'dataNotAvailableMessage'
"""
...
import doctest
doctest.testmod()
You should write more unit tests (use https://docs.python.org/3/library/unittest.html to avoid flooding the docstring) to secure the current behavior, but that's a good start.您应该编写更多单元测试(使用https://docs.python.org/3/library/unittest.html以避免淹没文档字符串)以保护当前行为,但这是一个好的开始。
Now, a small transformation: look at those (el)if 1 < len(contactData) and ...
lines.现在,一个小的转换:看看那些
(el)if 1 < len(contactData) and ...
行。 You can test the length just once:您可以只测试一次长度:
if 1 < len(contactData):
if re.search('email',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessEmail = contactTemp[1].strip()
businessDba = contactData[0].strip()
elif re.search('phone',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessPhone = contactTemp[1].strip()
businessDba = contactData[0].strip()
elif re.search('website',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessWebsite = contactTemp[1].strip()
businessDba = contactData[0].strip()
elif not re.search(r'(phone|email|website)',contactData[1].lower()):
businessDba = contactData[1].strip()
else:
businessDba = self.dataNotAvailableMessage
else:
businessDba = self.dataNotAvailableMessage
You notice that the penultimate else
is not reachable: either you have phone
, email
, website
or not:您注意到倒数第二个
else
无法访问:您是否有phone
、 email
、 website
:
if 1 < len(contactData):
if re.search('email',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessEmail = contactTemp[1].strip()
businessDba = contactData[0].strip()
elif re.search('phone',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessPhone = contactTemp[1].strip()
businessDba = contactData[0].strip()
elif re.search('website',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessWebsite = contactTemp[1].strip()
businessDba = contactData[0].strip()
else:
businessDba = contactData[1].strip()
else:
businessDba = self.dataNotAvailableMessage
Do the same for [2] and [3]:对 [2] 和 [3] 执行相同操作:
if 2 < len(contactData):
if re.search('email',contactData[2].lower()):
contactTemp = contactData[2].split('-')
businessEmail = contactTemp[1].strip()
elif re.search('phone',contactData[2].lower()):
contactTemp = contactData[2].split('-')
businessPhone = contactTemp[1].strip()
elif re.search('website',contactData[2].lower()):
contactTemp = contactData[2].split('-')
businessWebsite = contactTemp[1].strip()
if 3 < len(contactData):
if re.search('email',contactData[3].lower()):
contactTemp = contactData[3].split('-')
businessEmail = contactTemp[1].strip()
elif re.search('phone',contactData[3].lower()):
contactTemp = contactData[3].split('-')
businessPhone = contactTemp[1].strip()
elif re.search('website',contactData[3].lower()):
contactTemp = contactData[3].split('-')
businessWebsite = contactTemp[1].strip()
Now you see a clear pattern.现在你看到了一个清晰的模式。 Except the first part assignements
businessDba
, you have clearly three times the same process.除了第一部分分配
businessDba
,您显然有三倍相同的过程。 First, we isolate the assignement of businessDba
in the first part:首先,我们隔离第一部分
businessDba
的赋值:
if 1 < len(contactData):
if re.search('(email|phone|website)',contactData[1].lower()):
businessDba = contactData[0].strip()
else:
businessDba = contactData[1].strip()
else:
businessDba = self.dataNotAvailableMessage
And then:进而:
if 1 < len(contactData):
if re.search('email',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessEmail = contactTemp[1].strip()
elif re.search('phone',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessPhone = contactTemp[1].strip()
elif re.search('website',contactData[1].lower()):
contactTemp = contactData[1].split('-')
businessWebsite = contactTemp[1].strip()
Before we go further, we can remove the line在我们进一步之前,我们可以删除该行
businessName = None
businessDba = None
Since businessName
and businessDba
have always a value.由于
businessName
和businessDba
始终具有值。 And replace the new line:并替换新行:
businessDba = contactData[0].strip()
By:经过:
businessDba = businessName
That makes explicit the fallback.这明确了后备。
Now, we have three times the same process.现在,我们有三倍相同的过程。 A loop is a good idea:
循环是个好主意:
for i in range(1, 3):
if i >= len(contactData):
break
if re.search('email',contactData[i].lower()):
contactTemp = contactData[i].split('-')
businessEmail = contactTemp[1].strip()
elif re.search('phone',contactData[i].lower()):
contactTemp = contactData[i].split('-')
businessPhone = contactTemp[1].strip()
elif re.search('website',contactData[i].lower()):
contactTemp = contactData[i].split('-')
businessWebsite = contactTemp[1].strip()
We can extract contactTemp =
, even if it will not always be useful:我们可以提取
contactTemp =
,即使它并不总是有用:
for i in range(1, 3):
if i >= len(contactData):
break
contactTemp = contactData[i].split('-')
if re.search('email',contactData[i].lower()):
businessEmail = contactTemp[1].strip()
elif re.search('phone',contactData[i].lower()):
businessPhone = contactTemp[1].strip()
elif re.search('website',contactData[i].lower()):
businessWebsite = contactTemp[1].strip()
That's better, but I find the last part ( if element == ...
) really cumbersome: you test the element
against all possibilities.那更好,但我发现最后一部分(
if element == ...
)真的很麻烦:您针对所有可能性测试element
。 One would like a dictionary here.这里有人想要一本字典。 For a small transformation, we can write:
对于一个小的转换,我们可以写:
d = {
"businessName": businessName,
"businessDba": businessDba,
"businessPhone": businessPhone,
"businessEmail": businessEmail,
"businessWebsite": businessWebsite
}
return d.get(element, self.dataNotAvailableMessage)
Now, instead of initializing the dict at the end, we can create it and update it on the fly:现在,我们可以创建它并动态更新它,而不是在最后初始化 dict:
d = {
"businessPhone": None,
"businessEmail": None,
"businessWebsite": None
}
# Split rowdata on :
contactData = rowdata.split(':')
## [0] - business name should always be present
d["businessName"] = contactData[0].strip()
if 1 < len(contactData):
if re.search('(email|phone|website)',contactData[1].lower()):
d["businessDba"] = d["businessName"]
else:
d["businessDba"] = contactData[1].strip()
else:
d["businessDba"] = self.dataNotAvailableMessage
for i in range(1, 4):
if i >= len(contactData):
break
contactTemp = contactData[i].split('-')
if re.search('email',contactData[i].lower()):
d["businessEmail"] = contactTemp[1].strip()
elif re.search('phone',contactData[i].lower()):
d["businessPhone" = contactTemp[1].strip()
elif re.search('website',contactData[i].lower()):
d["businessWebsite"] = contactTemp[1].strip()
return d.get(element, self.dataNotAvailableMessage)
I ran the tests on every modification and it still works, but it is not so easy to read.我对每次修改都进行了测试,它仍然有效,但它并不那么容易阅读。 We can extract a function that creates the dict:
我们可以提取一个创建字典的函数:
def parseBusinessContactInformation(self, rowdata, element):
d = self._parseBusinessContactInformation(rowdata)
return d.get(element, self.dataNotAvailableMessage)
def _parseBusinessContactInformation(self, rowdata):
...
That's not bad, but we can improve this with a small behavior change (I hope you will be okay with this new behavior!):这还不错,但我们可以通过一个小的行为改变来改善这一点(我希望你能接受这个新行为!):
for i in range(1, 4):
if i >= len(contactData):
break
contactTemp = contactData[i].split('-')
if len(contactTemp) > 1:
d["business" + contactTemp[0].strip()] = contactTemp[1].strip()
What is the behavior change?什么是行为改变? Simply, we now accept something like
简单地说,我们现在接受类似
>>> p = MyParser()
>>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Foo- Bar", "businessFoo")
'Bar'
Since we accept more element
s, we should change the loop range
:由于我们接受更多的
element
,我们应该改变循环range
:
for i in range(1, len(contactData)):
...
It is time to focus on a slight inconsistance: why can businessDba
have the value self.dataNotAvailableMessage
that was created for the case of a non existing element?是时候关注一个轻微的不一致了:为什么
businessDba
可以具有为不存在元素的情况创建的值self.dataNotAvailableMessage
? We should use None
:我们应该使用
None
:
d = {
"businessDba": None,
...
}
and remove those two lines:并删除这两行:
else:
d["businessDba"] = self.dataNotAvailableMessage
Then this can be simplified:那么这可以简化:
if 1 < len(contactData):
if "-" in contactData[1]:
d["businessDba"] = d["businessName"]
else:
d["businessDba"] = contactData[1].strip()
Here's the code:这是代码:
def parseBusinessContactInformation(self,rowdata,element):
"""Process Business Contact Information
Examples:
>>> p = MyParser()
>>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessPhone")
'1234567890'
>>> p.parseBusinessContactInformation("Business Name, LLC : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessName")
'Business Name, LLC'
>>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Phone- 1234567890 : Website- www.site.com", "businessDba")
'Business DBA'
>>> p.parseBusinessContactInformation("Business Name, LLC : Phone- 1234567890", "businessEmail") is None
True
>>> p.parseBusinessContactInformation("Business Name, LLC : Phone- 1234567890", "?")
'dataNotAvailableMessage'
>>> p.parseBusinessContactInformation("Business Name, LLC : Business DBA : Foo- Bar", "businessFoo")
'Bar'
"""
d = self._parseBusinessContactInformation(rowdata)
return d.get(element, self.dataNotAvailableMessage)
def _parseBusinessContactInformation(self,rowdata):
d = {
"businessDba": None,
"businessPhone": None,
"businessEmail": None,
"businessWebsite": None
}
# Split rowdata on :
contactData = rowdata.split(':')
## [0] - business name should always be present
d["businessName"] = contactData[0].strip()
if 1 < len(contactData):
if "-" in contactData[1]:
d["businessDba"] = d["businessName"]
else:
d["businessDba"] = contactData[1].strip()
for i in range(1, len(contactData)):
contactTemp = contactData[i].split('-')
if len(contactTemp) > 1:
d["business" + contactTemp[0].strip()] = contactTemp[1].strip()
return d
The final touch: switch to snake case, make a get
and a parse
function: parse
returns a dict while get
returns a value:最后一步:切换到蛇形案例,创建一个
get
和一个parse
函数: parse
返回一个 dict 而get
返回一个值:
data_not_available_message = "dataNotAvailableMessage"
def get_business_contact_information(self, rowdata, element):
"""Process Business Contact Information
Examples:
>>> p = MyParser()
>>> p.get_business_contact_information("Business Name, LLC : Business DBA : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessPhone")
'1234567890'
>>> p.get_business_contact_information("Business Name, LLC : Email- person@email.com : Phone- 1234567890 : Website- www.site.com", "businessName")
'Business Name, LLC'
>>> p.get_business_contact_information("Business Name, LLC : Business DBA : Phone- 1234567890 : Website- www.site.com", "businessDba")
'Business DBA'
>>> p.get_business_contact_information("Business Name, LLC : Phone- 1234567890", "businessEmail") is None
True
>>> p.get_business_contact_information("Business Name, LLC : Phone- 1234567890", "?")
'dataNotAvailableMessage'
>>> p.get_business_contact_information("Business Name, LLC : Business DBA : Foo- Bar", "businessFoo")
'Bar'
:param rowdata: ...
:param element: ...
:return: ...
"""
d = self._parse_business_contact_information(rowdata)
return d.get(element, self.data_not_available_message)
With some cosmetic changes to make it more pythonic:进行一些外观更改以使其更加 Pythonic:
def parse_business_contact_information(self, rowdata):
"""Process Business Contact Information
Examples:
>>> p = MyParser()
>>> p.parse_business_contact_information("Business Name, LLC : Business DBA : Email- person@email.com : Phone- 1234567890 : Website- www.site.com") == {
... 'businessDba': 'Business DBA', 'businessPhone': '1234567890', 'businessEmail': 'person@email.com',
... 'businessWebsite': 'www.site.com', 'businessName': 'Business Name, LLC'}
True
>>> p.parse_business_contact_information("Business Name, LLC : Phone- 1234567890") == {
... 'businessDba': 'Business Name, LLC', 'businessPhone': '1234567890', 'businessEmail': None,
... 'businessWebsite': None, 'businessName': 'Business Name, LLC'}
True
:param rowdata: ...
:return: ...
"""
d = dict.fromkeys(("businessDba", "businessPhone",
"businessEmail", "businessWebsite"))
name, *others = rowdata.split(':') # destructuring assignment
d["businessName"] = name.strip()
if not others:
return d
if "-" in others[0]:
d["businessDba"] = d["businessName"]
else:
d["businessDba"] = others[0].strip()
others.pop(0) # consume others[0]
for data in others:
try:
key, value = data.split('-', 1) # a- b-c => a, b-c
except ValueError: # too many/not enough values to unpack
print("Element {} should have a dash".format(data))
else:
d["business" + key.strip()] = value.strip()
return d
The code is not perfect, but it is clearer than it was, at least to my eyes.代码并不完美,但比以前更清晰,至少在我看来是这样。
To summarize the method:总结方法:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.