简体   繁体   English

Python:在大写字母前查找小写/数字的正则表达式条件

[英]Python: regex condition to find lower case/digit before capital letter

I would like to split a string in python and make it into a dictionary such that a key is any chunk of characters between two capital letters and the value should be the number of occurrences of these chunk in the string.我想在 python 中拆分一个字符串并将其放入字典中,这样一个键是两个大写字母之间的任何字符块,值应该是这些块在字符串中的出现次数。

As an example: string = 'ABbACc1Dd2E' should return this: {'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}例如: string = 'ABbACc1Dd2E'应该返回: {'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}

I have found two working solution so far (see below), but I am looking for a more general/elegant solution to this, possibly a one-line regex condition.到目前为止,我已经找到了两个可行的解决方案(见下文),但我正在寻找一个更通用/更优雅的解决方案,可能是单行正则表达式条件。

Thank you谢谢

Solution 1解决方案1

string = 'ABbACc1Dd2E'
string = ' '.join(string)

for ii in re.findall("([A-Z] [a-z])",string) + \
          re.findall("([A-Z] [0-9])",string) + \
          re.findall("([a-x] [0-9])",string):
            new_ii = ii.replace(' ','')
            string = string.replace(ii, new_ii)

string = string.split()
all_dict = {}
for elem in string:
    all_dict[elem] = all_dict[elem] + 1 if elem in all_dict.keys() else 1 

print(all_dict)

{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}

Solution 2解决方案2

string = 'ABbACc1Dd2E'
all_upper = [ (pos,char) for (pos,char) in enumerate(string) if char.isupper() ]

all_dict = {}
for (pos,char) in enumerate(string):
    if (pos,char) in all_upper:
        new_elem = char
    else:
        new_elem += char

    if pos < len(string) -1 :
        if  string[pos+1].isupper():
            all_dict[new_elem] = all_dict[new_elem] + 1 if new_elem in all_dict.keys() else 1 
        else:
            pass
    else:
        all_dict[new_elem] = all_dict[new_elem] + 1 if new_elem in all_dict.keys() else 1 

print(all_dict)

{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}

Thanks to usr2564301 for this suggestion:感谢usr2564301提出这个建议:

The right regex is '[AZ][az]*\\d*'正确的正则表达式是'[AZ][az]*\\d*'

import re

string = 'ABbACc1Dd2E'
print(re.findall(r'[A-Z][a-z]*\d*', string))
['A', 'Bb', 'A', 'Cc1', 'Dd2', 'E']

One can then use itertools.groupby to make an iterator that returns consecutive keys and groups from the iterable.然后可以使用itertools.groupby制作一个迭代器,该迭代器从可迭代对象中返回连续的键和组。

from itertools import groupby

all_dict = {}
for i,j in groupby(re.findall(r'[A-Z][a-z]*\d*', string)):
    all_dict[i] = all_dict[i] + 1 if i in all_dict.keys() else 1 
print(all_dict)
{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}

Ultimately, one could use sorted() to get this in one line with the correct counting:最终,可以使用sorted()将其与正确的计数合并为一行:

print({i:len(list(j)) for i,j in groupby(sorted(re.findall(r'[A-Z][a-z]*\d*', string))) })
{'A': 2, 'Bb': 1, 'Cc1': 1, 'Dd2': 1, 'E': 1}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式在条件 Python 之前查找特定字母 - Regex to find specific letter before a condition Python python 正则表达式查找小写字母后跟大写字母 - python regex find lowercase followed by capital letter 如何在python中使用regex和re.sub查找所有大写和小写的unicode字符? - How to find all capital and lower case occurrences of unicode character using regex and re.sub in Python? 如何在没有空格的情况下对大写字母进行正则表达式| Python 3 - How to regex Capital Letter without space before | Python 3 在2个大写字母(regex)之前找到以大写字母开头的n个单词 - Find n words starting with capital letter before 2 words of capital letters (regex) 正则表达式 - 查找包含至少 1 个大写字母、一位数字或一个特殊字符的连续“单词” - Regex - Find successive 'words' containing at least 1 capital letter, one digit or one special character Python 正则表达式如果前面没有数字,则查找单个数字 - Python regex find single digit if no digits before it Python Regex - 检查大写字母后面的大写字母 - Python Regex - checking for a capital letter with a lowercase after Python 正则表达式拆分数字和大写字母 - Python regex to split both on number and on capital letter 如何找到一个单词 - 第一个字母大写,其他字母小写 - How to find a word - First letter will be capital & other will be lower
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM