如何在Python中获取第一个大写字母，然后再获取每个不跟另一个大写字母的字母？

Question

I am developing a script that creates abbrevations for a list of names that are too long for me to use. 我正在开发一个脚本，该脚本为一个我无法使用的名称列表创建缩写。 I need to split each name into parts divided by dots and then take each capital letter that is at a beginning of a word. 我需要将每个名称分成多个点，然后将每个大写字母都放在单词的开头。 Just like this: 像这样：

InternetGatewayDevice.DeviceInfo.Description -> IGD.DI.D InternetGatewayDevice.DeviceInfo.Description- > IGD.DI.D

However, if there are more consecutive capital letters (like in the following example), I only want to take the first one and then the one that is not followed by a capital letter. 但是，如果有更多连续的大写字母（如下面的示例所示），我只想采用第一个字母，然后是不带大写字母的字母。 So, from " WANDevice " I want get " WD ". 因此，我想从“ WANDevice ”获得“ WD ”。 Like this: 像这样：

InternetGatewayDevice.WANDevice.1.WANConnectionDevice.1.WANIPConnection.1.PortMapping.7.ExternalPort -> IGD.WD1.WCD1.WC1.PM7.EP InternetGatewayDevice.WANDevice.1.WANConnectionDevice.1.WANIPConnection.1.PortMapping.7.ExternalPort-> IGD.WD1.WCD1.WC1.PM7.EP

So far I have written this script: 到目前为止，我已经编写了以下脚本：

data = json.load(open('./cwmp/tr069/test.json'))

def shorten(i):
    x = i.split(".")
    abbreviations = []
    for each in x:
        abbrev = ''
        for each_letter in each:
            if each_letter.isupper():
                abbrev = abbrev + each_letter
        abbreviations.append(abbrev)
    short_string = ".".join(abbreviations)
    return short_string

for i in data["mappings"]["cwmp_genieacs"]["properties"]:
    if "." in i:
        shorten(i)
    else:
        pass

It works correctly "translates" the first example but I am not sure how to do the rest. 它可以正确地“翻译”第一个示例，但是我不确定其余的方法。 I think if I had to, I would probably think of some way to do it (like maybe split the strings into single characters) but I am looking for an efficient & smart way to do it. 我认为如果需要的话，我可能会想到一些实现此目标的方法（例如，将字符串拆分为单个字符），但我正在寻找一种高效且智能的方法来实现。 I will be grateful for any advice. 如有任何建议，我将不胜感激。

I am using Python 3.6. 我正在使用Python 3.6。

EDIT: 编辑：

I decided to try a different approach and iterate over single characters and I pretty easily achieved what I wanted. 我决定尝试一种不同的方法，并迭代单个字符，然后很容易达到我想要的目标。 Nevertheless, thank you for your answers and suggestions, I will most certainly go through them. 不过，非常感谢您的回答和建议，我一定会通过它们的。

def char_by_char(i):
    abbrev= ""
    for index, each_char in enumerate(i):
        # Define previous and next characters 
        if index == 0:
            previous_char = None
        else:
            previous_char = i[index - 1]

        if index == len(i) - 1:
            next_char = None
        else:
            next_char = i[index + 1]
        # Character is uppercase
        if each_char.isupper():
            if next_char is not None:
                if next_char.isupper():
                    if (previous_char is ".") or (previous_char is None):
                        abbrev = abbrev + each_char
                    else:
                        pass
                else:
                    abbrev = abbrev + each_char
            else:
                pass
        # Character is "."
        elif each_char is ".":
            if next_char.isdigit():
                pass
            else:
                abbrev = abbrev + each_char

        # Character is a digit              
        elif each_char.isdigit():
            abbrev = abbrev + each_char

        # Character is lowercase            
        else:
            pass
    print(abbrev)


for i in data["mappings"]["cwmp_genieacs"]["properties"]:
    if "." in i:
        char_by_char(i)
    else:
        pass

Answer 1

You could use a regular expression for that. 您可以为此使用正则表达式。 For instance, you could use capture groups for the characters that you want to keep, and perform a substitution where you only keep those captured characters: 例如，您可以对要保留的字符使用捕获组，并在仅保留那些捕获的字符的地方执行替换：

import re

def shorten(s):
    return re.sub(r'([A-Z])(?:[A-Z]*(?=[A-Z])|[^A-Z.]*)|\.(\d+)[^A-Z.]*', r'\1\2', s)

Explanation: 说明：

([AZ]) : capture a capital letter ([AZ]) ：大写字母
(?: ) : this is a grouping to make clear what the scope is of the | (?: ) ：）：这是一个分组，以明确|范围是什么| operation inside of it. 里面的操作。 This is not a capture group like above (so this will be deleted) 这不是上面的捕获组（因此将被删除）
[AZ]* : zero or more capital letters (greedy) [AZ]* ：零个或多个大写字母（贪心）
(?=[AZ]) : one more capital letter should follow, but don't process it -- leave it for the next match (?=[AZ]) ：应该再加上一个大写字母，但不要对其进行处理-留给下一场比赛
| : logical OR ：逻辑或
[^AZ.]* : zero or more non-capitals, non-point (following the captured capital letter): these will be deleted [^AZ.]* ：零个或多个非大写非点数（紧随大写字母之后）：这些将被删除
\\.(\\d+) : a literal point followed by one or more digits: capture the digits (in order to throw away the dot). \\.(\\d+) ：一个文字点后跟一个或多个数字：捕获数字（以丢弃点）。

In the replacement argument, the captured groups are injected again: 在替换参数中，捕获的组再次注入：

\\1 : first capture group (this is the capital letter) \\1 ：第一个捕获组（这是大写字母）
\\2 : second capture group (these are the digit(s) that followed a dot) \\2 ：第二个捕获组（这些是点后面的数字）

In one match, only one of the capture groups will have something, the other will just be the empty string. 在一个匹配中，只有一个捕获组将具有某些内容，另一个则只是空字符串。 But the regular expression matching is repeated throughout the whole input string. 但是，正则表达式匹配会在整个输入字符串中重复进行。

Answer 2

Here is a non-regex solution. 这是一个非正则表达式的解决方案。

def shorten(i):
    abr_list = []
    abrev = ''
    parts = i.split('.')
    for word in parts:
        for x in range(len(word)):
            if x == 0 and word[x].isupper() or word[x].isupper() and not word[x + 1].isupper() or word[x].isnumeric():
                abrev += word[x]
        abr_list.append(abrev)
        abrev = ''
    return join_parts(abr_list)


def join_parts(part_list):
    ret = part_list[0]
    for part in part_list[1:]:
        if not part.isnumeric():
            ret += '.%s' % part
        else:
            ret += part
    return ret

Answer 3

import re
def foo(s):
    print(''.join(list(map(
        lambda matchobj: matchobj[0], re.finditer(
            r'(?<![A-Z])[A-Z]|[A-Z](?![A-Z])|\.', s)))))
foo('InternetGatewayDevice.DeviceInfo.Description')
foo('WANDevice')
# output: 
# IGD.DI.D
# WD

There's three major parts to the regex: 正则表达式包含三个主要部分：

match if it's a capital letter with no capital letter in front of it (?<![AZ])[AZ] or 如果是大写字母，但前面没有大写字母，则匹配(?<![AZ])[AZ]或
match if it's a capital letter with no capital letter after it [AZ](?![AZ]) or 如果是[AZ](?![AZ])后没有大写字母，则匹配
if it's a literal period 如果是字面上的时期

https://docs.python.org/3.6/library/re.html https://docs.python.org/3.6/library/re.html

如何在Python中获取第一个大写字母，然后再获取每个不跟另一个大写字母的字母？

问题描述

3 个解决方案

解决方案1
3 2018-03-01 13:21:02

解决方案2
2 已采纳 2018-03-01 13:40:13

解决方案3
1 2018-03-01 13:34:15

如何在Python中获取第一个大写字母，然后再获取每个不跟另一个大写字母的字母？

问题描述

3 个解决方案

解决方案1 3 2018-03-01 13:21:02

解决方案2 2 已采纳 2018-03-01 13:40:13

解决方案3 1 2018-03-01 13:34:15

解决方案1
3 2018-03-01 13:21:02

解决方案2
2 已采纳 2018-03-01 13:40:13

解决方案3
1 2018-03-01 13:34:15