如何在不删除这些字符的情况下拆分带有特殊字符的字符串？

Question

I'm writing this function which needs to return an abbreviated version of a str .我正在写这个 function ，它需要返回一个str的缩写版本。 The return str must contain the first letter, number of characters removed and the, last letter;it must be abbreviated per word and not by sentence, then after that I need to join every word again with the same format including the special-characters.返回的str必须包含第一个字母，删除的字符数和最后一个字母；它必须是每个单词而不是句子的缩写，然后我需要以相同的格式再次加入每个单词，包括特殊字符。 I tried using the re.findall() method but it automatically removes the special-characters so I can't use " ".join() because it will leave out the special-characters.我尝试使用re.findall()方法，但它会自动删除特殊字符，所以我不能使用" ".join()因为它会遗漏特殊字符。

Here's my code:这是我的代码：

import re
def abbreviate(wrd):
    return " ".join([i if len(i) < 4 else i[0] + str(len(i[1:-1])) + i[-1] for i in re.findall(r"[\w']+", wrd)]) 

print(abbreviate("elephant-rides are really fun!"))

The output would be: output 将是：

e6t r3s are r4y fun

But the output should be:但是 output 应该是：

e6t-r3s are r4y fun!

Answer 1

No need for str.join .不需要str.join 。 Might as well take full advantage of what the re module has to offer.不妨充分利用re模块所提供的功能。

re.sub accepts a string or a callable object (like a function or lambda), which takes the current match as an input and must return a string with which to replace the current match. re.sub接受字符串或可调用的 object（如 function 或 lambda），它将当前匹配作为输入，并且必须返回一个字符串来替换当前匹配。

import re

pattern = "\\b[a-z]([a-z]{2,})[a-z]\\b"
string = "elephant-rides are really fun!"

def replace(match):
    return f"{match.group(0)[0]}{len(match.group(1))}{match.group(0)[-1]}"

abbreviated = re.sub(pattern, replace, string)

print(abbreviated)

Output: Output：

e6t-r3s are r4y fun!
>>>

Maybe someone else can improve upon this answer with a cuter pattern, or any other suggestions.也许其他人可以通过更可爱的模式或任何其他建议来改进这个答案。 The way the pattern is written now, it assumes that you're only dealing with lowercase letters, so that's something to keep in mind - but it should be pretty straightforward to modify it to suit your needs.现在编写模式的方式假设您只处理小写字母，因此请记住这一点 - 但修改它以满足您的需要应该非常简单。 I'm not really a fan of the repetition of [az] , but that's just the quickest way I could think of for capturing the "inner" characters of a word in a separate capturing group.我不太喜欢重复[az] ，但这只是我能想到的在单独的捕获组中捕获单词的“内部”字符的最快方法。 You may also want to consider what should happen with words/contractions like "don't" or "shouldn't" .您可能还想考虑"don't"或"shouldn't"类的单词/收缩应该发生什么。

Answer 2

Thank you for viewing my question.感谢您查看我的问题。 After a few more searches, trial, and error I finally found a way to execute my code properly without changing it too much.经过几次搜索、反复试验和错误，我终于找到了一种无需过多更改即可正确执行代码的方法。 I simply substituted re.findall(r"[\w']+", wrd) with re.split(r'([\W\d\_])', wrd) and also removed the whitespace in "".join() for they were simply not needed anymore.我只是用re.split(r'([\W\d\_])', wrd)替换re.findall(r"[\w']+", wrd) wrd) 并且还删除了"".join()中的whitespace "".join()因为他们根本不再需要了。

    import re
    def abbreviate(wrd):
        return "".join([i if len(i) < 4 else i[0] + str(len(i[1:-1])) + i[-1] for i in re.split(r'([\W\d\_])', wrd)])

     print(abbreviate("elephant-rides are not fun!"))

Output: Output：

     e6t-r3s are not fun!

如何在不删除这些字符的情况下拆分带有特殊字符的字符串？

问题描述

2 个解决方案

解决方案1
2 2020-07-25 09:59:38

解决方案2
0 2020-07-26 03:28:13

如何在不删除这些字符的情况下拆分带有特殊字符的字符串？

问题描述

2 个解决方案

解决方案1 2 2020-07-25 09:59:38

解决方案2 0 2020-07-26 03:28:13

解决方案1
2 2020-07-25 09:59:38

解决方案2
0 2020-07-26 03:28:13