從字符串中提取多個子字符串

Question

我有一個復雜的字符串，想嘗試從中提取多個子字符串。

該字符串由一組項目組成，以逗號分隔。 每個項目都有一對用括號括起來的單詞的標識符（id-n）。 我只想在括號內的單詞末尾附加一個數字（例如“ This-1”）。 該數字實際上指示提取后單詞應如何變位的位置。

#Example of how the individual items would look like
id1(attr1, is-2) #The number 2 here indicates word 'is' should be in position 2
id2(attr2, This-1) #The number 1 here indicates word 'This' should be in position 1
id3(attr3, an-3) #The number 3 here indicates word 'an' should be in position 3
id4(attr4, example-4) #The number 4 here indicates word 'example' should be in position 4
id5(attr5, example-4) #This is a duplicate of the word 'example'

#Example of string - this is how the string with the items looks like
string = "id1(attr1, is-1), id2(attr2, This-2), id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"

#This is how the result should look after extraction
result = 'This is an example'

有沒有更簡單的方法可以做到這一點？ 正則表達式不適用於我。

Answer 1

為什么不使用正則表達式？ 這可行。

In [44]: s = "id1(attr1, is-2), id2(attr2, This-1), id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"

In [45]: z = [(m.group(2), m.group(1)) for m in re.finditer(r'(\w+)-(\d+)\)', s)]

In [46]: [x for y, x in sorted(set(z))]
Out[46]: ['This', 'is', 'an', 'example']

Answer 2

一個簡單/天真的方法：

>>> z = [x.split(',')[1].strip().strip(')') for x in s.split('),')]
>>> d = defaultdict(list)
>>> for i in z:
...    b = i.split('-')
...    d[b[1]].append(b[0])
...
>>> ' '.join(' '.join(d[t]) for t in sorted(d.keys(), key=int))
'is This an example example'

example ，在示例字符串中有重復的位置，這就是為什么在代碼中重復example原因。

但是，您的樣本也不符合您的要求-但這是根據您的描述的結果。 單詞按照其位置指示符排列。

現在，如果要消除重復項：

>>> ' '.join(e for t in sorted(d.keys(), key=int) for e in set(d[t]))
'is This an example'

Answer 3

好吧，這呢：

sample = "id1(attr1, is-2), id2(attr2, This-1), 
          id3(attr3, an-3), id4(attr4, example-4), id5(atttr5, example-4)"


def make_cryssie_happy(s):
    words = {} # we will use this dict later
    ll = s.split(',')[1::2]
    # we only want items like This-1, an-3, etc.

    for item in ll:
        tt = item.replace(')','').lstrip()
        (word, pos) = tt.split('-')
        words[pos] = word
        # there can only be one word at a particular position
        # using a dict with the numbers as positions keys 
        # is an alternative to using sets

    res = [words[i] for i in sorted(words)]
    # sort the keys, dicts are unsorted!
    # create a list of the values of the dict in sorted order

    return ' '.join(res)
    # return a nice string


print make_cryssie_happy(sample)

從字符串中提取多個子字符串

問題描述

3 個解決方案

解決方案1
2 2013-06-12 04:29:52

解決方案2
2 已采納 2013-06-12 04:35:06

解決方案3
1 2013-06-12 07:01:29

從字符串中提取多個子字符串

問題描述

3 個解決方案

解決方案1 2 2013-06-12 04:29:52

解決方案2 2 已采納 2013-06-12 04:35:06

解決方案3 1 2013-06-12 07:01:29

解決方案1
2 2013-06-12 04:29:52

解決方案2
2 已采納 2013-06-12 04:35:06

解決方案3
1 2013-06-12 07:01:29