I'm writing this function which needs to return an abbreviated version of a str
. The return str
must contain the first letter, number of characters removed and the, last letter;it must be abbreviated per word and not by sentence, then after that I need to join every word again with the same format including the special-characters. I tried using the re.findall()
method but it automatically removes the special-characters so I can't use " ".join()
because it will leave out the special-characters.
Here's my code:
import re
def abbreviate(wrd):
return " ".join([i if len(i) < 4 else i[0] + str(len(i[1:-1])) + i[-1] for i in re.findall(r"[\w']+", wrd)])
print(abbreviate("elephant-rides are really fun!"))
The output would be:
e6t r3s are r4y fun
But the output should be:
e6t-r3s are r4y fun!
No need for str.join
. Might as well take full advantage of what the re
module has to offer.
re.sub
accepts a string or a callable object (like a function or lambda), which takes the current match as an input and must return a string with which to replace the current match.
import re
pattern = "\\b[a-z]([a-z]{2,})[a-z]\\b"
string = "elephant-rides are really fun!"
def replace(match):
return f"{match.group(0)[0]}{len(match.group(1))}{match.group(0)[-1]}"
abbreviated = re.sub(pattern, replace, string)
print(abbreviated)
Output:
e6t-r3s are r4y fun!
>>>
Maybe someone else can improve upon this answer with a cuter pattern, or any other suggestions. The way the pattern is written now, it assumes that you're only dealing with lowercase letters, so that's something to keep in mind - but it should be pretty straightforward to modify it to suit your needs. I'm not really a fan of the repetition of [az]
, but that's just the quickest way I could think of for capturing the "inner" characters of a word in a separate capturing group. You may also want to consider what should happen with words/contractions like "don't"
or "shouldn't"
.
Thank you for viewing my question. After a few more searches, trial, and error I finally found a way to execute my code properly without changing it too much. I simply substituted re.findall(r"[\w']+", wrd)
with re.split(r'([\W\d\_])', wrd)
and also removed the whitespace
in "".join()
for they were simply not needed anymore.
import re
def abbreviate(wrd):
return "".join([i if len(i) < 4 else i[0] + str(len(i[1:-1])) + i[-1] for i in re.split(r'([\W\d\_])', wrd)])
print(abbreviate("elephant-rides are not fun!"))
Output:
e6t-r3s are not fun!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.