I am developing a script that creates abbrevations for a list of names that are too long for me to use. I need to split each name into parts divided by dots and then take each capital letter that is at a beginning of a word. Just like this:
InternetGatewayDevice.DeviceInfo.Description -> IGD.DI.D
However, if there are more consecutive capital letters (like in the following example), I only want to take the first one and then the one that is not followed by a capital letter. So, from " WANDevice " I want get " WD ". Like this:
InternetGatewayDevice.WANDevice.1.WANConnectionDevice.1.WANIPConnection.1.PortMapping.7.ExternalPort -> IGD.WD1.WCD1.WC1.PM7.EP
So far I have written this script:
data = json.load(open('./cwmp/tr069/test.json'))
def shorten(i):
x = i.split(".")
abbreviations = []
for each in x:
abbrev = ''
for each_letter in each:
if each_letter.isupper():
abbrev = abbrev + each_letter
abbreviations.append(abbrev)
short_string = ".".join(abbreviations)
return short_string
for i in data["mappings"]["cwmp_genieacs"]["properties"]:
if "." in i:
shorten(i)
else:
pass
It works correctly "translates" the first example but I am not sure how to do the rest. I think if I had to, I would probably think of some way to do it (like maybe split the strings into single characters) but I am looking for an efficient & smart way to do it. I will be grateful for any advice.
I am using Python 3.6.
EDIT:
I decided to try a different approach and iterate over single characters and I pretty easily achieved what I wanted. Nevertheless, thank you for your answers and suggestions, I will most certainly go through them.
def char_by_char(i):
abbrev= ""
for index, each_char in enumerate(i):
# Define previous and next characters
if index == 0:
previous_char = None
else:
previous_char = i[index - 1]
if index == len(i) - 1:
next_char = None
else:
next_char = i[index + 1]
# Character is uppercase
if each_char.isupper():
if next_char is not None:
if next_char.isupper():
if (previous_char is ".") or (previous_char is None):
abbrev = abbrev + each_char
else:
pass
else:
abbrev = abbrev + each_char
else:
pass
# Character is "."
elif each_char is ".":
if next_char.isdigit():
pass
else:
abbrev = abbrev + each_char
# Character is a digit
elif each_char.isdigit():
abbrev = abbrev + each_char
# Character is lowercase
else:
pass
print(abbrev)
for i in data["mappings"]["cwmp_genieacs"]["properties"]:
if "." in i:
char_by_char(i)
else:
pass
You could use a regular expression for that. For instance, you could use capture groups for the characters that you want to keep, and perform a substitution where you only keep those captured characters:
import re
def shorten(s):
return re.sub(r'([A-Z])(?:[A-Z]*(?=[A-Z])|[^A-Z.]*)|\.(\d+)[^A-Z.]*', r'\1\2', s)
Explanation:
([AZ])
: capture a capital letter (?: )
: this is a grouping to make clear what the scope is of the |
operation inside of it. This is not a capture group like above (so this will be deleted) [AZ]*
: zero or more capital letters (greedy) (?=[AZ])
: one more capital letter should follow, but don't process it -- leave it for the next match |
: logical OR [^AZ.]*
: zero or more non-capitals, non-point (following the captured capital letter): these will be deleted \\.(\\d+)
: a literal point followed by one or more digits: capture the digits (in order to throw away the dot). In the replacement argument, the captured groups are injected again:
\\1
: first capture group (this is the capital letter) \\2
: second capture group (these are the digit(s) that followed a dot) In one match, only one of the capture groups will have something, the other will just be the empty string. But the regular expression matching is repeated throughout the whole input string.
Here is a non-regex solution.
def shorten(i):
abr_list = []
abrev = ''
parts = i.split('.')
for word in parts:
for x in range(len(word)):
if x == 0 and word[x].isupper() or word[x].isupper() and not word[x + 1].isupper() or word[x].isnumeric():
abrev += word[x]
abr_list.append(abrev)
abrev = ''
return join_parts(abr_list)
def join_parts(part_list):
ret = part_list[0]
for part in part_list[1:]:
if not part.isnumeric():
ret += '.%s' % part
else:
ret += part
return ret
import re
def foo(s):
print(''.join(list(map(
lambda matchobj: matchobj[0], re.finditer(
r'(?<![A-Z])[A-Z]|[A-Z](?![A-Z])|\.', s)))))
foo('InternetGatewayDevice.DeviceInfo.Description')
foo('WANDevice')
# output:
# IGD.DI.D
# WD
There's three major parts to the regex:
(?<![AZ])[AZ]
or [AZ](?![AZ])
or
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.