简体   繁体   English

连接并简化包含数字和字母对的列表

[英]Concatenate and simplify a list containing number and letter pairs

I have a list of strings representing numbers.我有一个代表数字的字符串列表。 I can't use int because some of the numbers have attached letters, like '33a' or '33b'我不能使用 int,因为有些数字带有附加字母,例如“33a”或“33b”

['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', '33c', '33d', '33e', '33f', '34', '34', '35a', '35a' ]

My goal is to concatenate the numbers to one string and separate them using a forward slash.我的目标是将数字连接成一个字符串并使用正斜杠将它们分开。

If a number is repeated and its additional letters continue in alphabetical order, the representation should be simplified as follows:如果一个数字重复并且其附加字母按字母顺序继续,则表示应简化如下:

['23a'/'23b'] --> '23a-b'

If a number is repeated without additional letters, it should be listed only once.如果一个数字在没有附加字母的情况下重复出现,它应该只列出一次。 The same applies to repeating identical pairs of numbers and additional letters.这同样适用于重复相同的数字对和附加字母。

For the complete example, the desired output looks like this:对于完整示例,所需的 output 如下所示:

'21/23a-b/23k-l/23x/25/33a-f/34/35a'

Using the following code I am able to concatenate the numbers and exclude duplicates, but I fail in trying to simplify the numbers with letters according to the above example.使用以下代码,我能够连接数字并排除重复项,但我未能根据上述示例尝试用字母简化数字。

numbers = ['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', '33c', '33d', '33e', '33f', '34', '34', '35a', '35a' ]

concat_numbers = ""
numbers_set = list(set(numbers))

numbers_set.sort()
for number in numbers_set: 
    concat_numbers += number + "/"
    
print(concat_numbers)
>>> '21/23a/23b/23k/23l/23x/25/33a/33b/33c/33d/33e/33f/34/35a/'

Any hints on how to achieve this in the most pythonic way?关于如何以最 pythonic 方式实现这一点的任何提示?

This can be done by leveraging defaultdict(list) and recreate your output like so:这可以通过利用 defaultdict(list) 并重新创建您的 output 来完成,如下所示:

data = ['21', '23a', '23b', '23k', '23l', '23x', '25', '33a', '33b', 
        '33c', '33d', '33e', '33f', '34', '34', '35a', '35a']
data.sort() # easier if letters are sorted - so sort it

from collections import defaultdict
from itertools import takewhile
d = defaultdict(list)
for n in data:
    # split in number/letters
    number = ''.join(takewhile(str.isdigit, n))
    letter = n[len(number):]
    # add to dict
    d[number].append(letter)

print(d)

We now have a dict with "numbers" as keys and all letters as list and need to clean that up further:我们现在有一个以“数字”作为键,所有字母作为列表的字典,需要进一步清理它:

# concat letters that follow each other
def unify(l):
    u = [""]
    # remember start/end values
    first = l[0]
    last = l[0]
    # iterate the list of letters given
    for letter in l:
        # for same letters or a->b letters, move last forward
        if last == letter or ord(last) == ord(letter)-1:
            last = letter
        else:
            # letter range stopped, add to list
            u.append(f"{first}-{last}")
            # start over with new values
            first = letter
            last = letter
    # add if not part of the resulting list already
    if  not u[-1].startswith(first):
        # either single letter or range, then add as range
        u.append( first if last == first else f"{first}-{last}")

    # ignore empty results in u
    return ",".join( (w for w in u if w) )

# unify letters
for key,value in d.items():
    d[key] = unify(value)

print(d)

and then construct the final output:然后构建最终的 output:

r = "/".join(f"{ky}{v}" for ky,vl in d.items() for v in vl.split(","))
print(r)

Output: Output:

# collected splitted key/values
defaultdict(<class 'list'>, 
{'21': [''], '23': ['a', 'b', 'k', 'l', 'x'], 
 '25': [''], '33': ['a', 'b', 'c', 'd', 'e', 'f'], 
 '34': ['', ''], '35': ['a', 'a']})

# unified values
defaultdict(<class 'list'>, 
{'21': '', '23': 'a-b,k-l,x', '25': '', 
 '33': 'a-f', '34': '', '35': 'a'})

# as string
21/23a-b/23k-l/23x/25/33a-f/34/35a       

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM