[英]How to get the first capital letter and then each that isn't followed by another capital letter in Python?
I am developing a script that creates abbrevations for a list of names that are too long for me to use. 我正在开发一个脚本,该脚本为一个我无法使用的名称列表创建缩写。 I need to split each name into parts divided by dots and then take each capital letter that is at a beginning of a word.
我需要将每个名称分成多个点,然后将每个大写字母都放在单词的开头。 Just like this:
像这样:
InternetGatewayDevice.DeviceInfo.Description -> IGD.DI.D InternetGatewayDevice.DeviceInfo.Description- > IGD.DI.D
However, if there are more consecutive capital letters (like in the following example), I only want to take the first one and then the one that is not followed by a capital letter. 但是,如果有更多连续的大写字母(如下面的示例所示),我只想采用第一个字母,然后是不带大写字母的字母。 So, from " WANDevice " I want get " WD ".
因此,我想从“ WANDevice ”获得“ WD ”。 Like this:
像这样:
InternetGatewayDevice.WANDevice.1.WANConnectionDevice.1.WANIPConnection.1.PortMapping.7.ExternalPort -> IGD.WD1.WCD1.WC1.PM7.EP InternetGatewayDevice.WANDevice.1.WANConnectionDevice.1.WANIPConnection.1.PortMapping.7.ExternalPort-> IGD.WD1.WCD1.WC1.PM7.EP
So far I have written this script: 到目前为止,我已经编写了以下脚本:
data = json.load(open('./cwmp/tr069/test.json'))
def shorten(i):
x = i.split(".")
abbreviations = []
for each in x:
abbrev = ''
for each_letter in each:
if each_letter.isupper():
abbrev = abbrev + each_letter
abbreviations.append(abbrev)
short_string = ".".join(abbreviations)
return short_string
for i in data["mappings"]["cwmp_genieacs"]["properties"]:
if "." in i:
shorten(i)
else:
pass
It works correctly "translates" the first example but I am not sure how to do the rest. 它可以正确地“翻译”第一个示例,但是我不确定其余的方法。 I think if I had to, I would probably think of some way to do it (like maybe split the strings into single characters) but I am looking for an efficient & smart way to do it.
我认为如果需要的话,我可能会想到一些实现此目标的方法(例如,将字符串拆分为单个字符),但我正在寻找一种高效且智能的方法来实现。 I will be grateful for any advice.
如有任何建议,我将不胜感激。
I am using Python 3.6. 我正在使用Python 3.6。
EDIT: 编辑:
I decided to try a different approach and iterate over single characters and I pretty easily achieved what I wanted. 我决定尝试一种不同的方法,并迭代单个字符,然后很容易达到我想要的目标。 Nevertheless, thank you for your answers and suggestions, I will most certainly go through them.
不过,非常感谢您的回答和建议,我一定会通过它们的。
def char_by_char(i):
abbrev= ""
for index, each_char in enumerate(i):
# Define previous and next characters
if index == 0:
previous_char = None
else:
previous_char = i[index - 1]
if index == len(i) - 1:
next_char = None
else:
next_char = i[index + 1]
# Character is uppercase
if each_char.isupper():
if next_char is not None:
if next_char.isupper():
if (previous_char is ".") or (previous_char is None):
abbrev = abbrev + each_char
else:
pass
else:
abbrev = abbrev + each_char
else:
pass
# Character is "."
elif each_char is ".":
if next_char.isdigit():
pass
else:
abbrev = abbrev + each_char
# Character is a digit
elif each_char.isdigit():
abbrev = abbrev + each_char
# Character is lowercase
else:
pass
print(abbrev)
for i in data["mappings"]["cwmp_genieacs"]["properties"]:
if "." in i:
char_by_char(i)
else:
pass
You could use a regular expression for that. 您可以为此使用正则表达式。 For instance, you could use capture groups for the characters that you want to keep, and perform a substitution where you only keep those captured characters:
例如,您可以对要保留的字符使用捕获组,并在仅保留那些捕获的字符的地方执行替换:
import re
def shorten(s):
return re.sub(r'([A-Z])(?:[A-Z]*(?=[A-Z])|[^A-Z.]*)|\.(\d+)[^A-Z.]*', r'\1\2', s)
Explanation: 说明:
([AZ])
: capture a capital letter ([AZ])
:大写字母 (?: )
: this is a grouping to make clear what the scope is of the |
(?: )
:):这是一个分组,以明确|
范围是什么|
operation inside of it. [AZ]*
: zero or more capital letters (greedy) [AZ]*
:零个或多个大写字母(贪心) (?=[AZ])
: one more capital letter should follow, but don't process it -- leave it for the next match (?=[AZ])
:应该再加上一个大写字母,但不要对其进行处理-留给下一场比赛 |
: logical OR [^AZ.]*
: zero or more non-capitals, non-point (following the captured capital letter): these will be deleted [^AZ.]*
:零个或多个非大写非点数(紧随大写字母之后):这些将被删除 \\.(\\d+)
: a literal point followed by one or more digits: capture the digits (in order to throw away the dot). \\.(\\d+)
:一个文字点后跟一个或多个数字:捕获数字(以丢弃点)。 In the replacement argument, the captured groups are injected again: 在替换参数中,捕获的组再次注入:
\\1
: first capture group (this is the capital letter) \\1
:第一个捕获组(这是大写字母) \\2
: second capture group (these are the digit(s) that followed a dot) \\2
:第二个捕获组(这些是点后面的数字) In one match, only one of the capture groups will have something, the other will just be the empty string. 在一个匹配中,只有一个捕获组将具有某些内容,另一个则只是空字符串。 But the regular expression matching is repeated throughout the whole input string.
但是,正则表达式匹配会在整个输入字符串中重复进行。
Here is a non-regex solution. 这是一个非正则表达式的解决方案。
def shorten(i):
abr_list = []
abrev = ''
parts = i.split('.')
for word in parts:
for x in range(len(word)):
if x == 0 and word[x].isupper() or word[x].isupper() and not word[x + 1].isupper() or word[x].isnumeric():
abrev += word[x]
abr_list.append(abrev)
abrev = ''
return join_parts(abr_list)
def join_parts(part_list):
ret = part_list[0]
for part in part_list[1:]:
if not part.isnumeric():
ret += '.%s' % part
else:
ret += part
return ret
import re
def foo(s):
print(''.join(list(map(
lambda matchobj: matchobj[0], re.finditer(
r'(?<![A-Z])[A-Z]|[A-Z](?![A-Z])|\.', s)))))
foo('InternetGatewayDevice.DeviceInfo.Description')
foo('WANDevice')
# output:
# IGD.DI.D
# WD
There's three major parts to the regex: 正则表达式包含三个主要部分:
(?<![AZ])[AZ]
or (?<![AZ])[AZ]
或 [AZ](?![AZ])
or [AZ](?![AZ])
后没有大写字母,则匹配 https://docs.python.org/3.6/library/re.html https://docs.python.org/3.6/library/re.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.