简体   繁体   中英

Separate string by multiple separators and return separators and separated strings

I want to separate strings by separators constist of more than one char saved in the variable sep_list .

My aim then is to receive the last separated string s1 and the last separator which has s1 on his right hand side.

sep_list = ['→E', '¬E', '↓I']

string1 = "peter →E tom ¬E luis ↓I ed"
string2 = "sigrid →E jose l. ¬E jose t."

Applied on string1 the algorithm should return the string s1 :

"↓I, ed"

and applied on string2 the algorithm should return the string s1 :

"¬E, jose t."

What is a way to do that with python?

Assuming the separators may exist in any order (or not at all), you could do this:

sep_list = ['→E', '¬E', '↓I']

string1 = "peter →E tom ¬E luis ↓I ed"
string2 = "sigrid →E jose l. ¬E jose t."

def process(s):
    indexes = []
    for sep in sep_list:
        if (index := s.find(sep)) >= 0:
            indexes.append((index, sep))
    if indexes:
        indexes.sort()
        t = indexes[-1]
        return f"{t[1]},{s[t[0]+len(t[1]):]}"

print(process(string1))
print(process(string2))

Output:

↓I, ed
¬E, jose t.

Another way to do so using regex:

import re

sep_list = ['→E', '¬E', '↓I']

string1 = "peter →E tom ¬E luis ↓I ed"
string2 = "sigrid →E jose l. ¬E jose t."

def separate_string(data, seps):
    pattern = "|".join(re.escape(sep) for sep in seps)
    start, end = [m.span() for m in re.finditer(pattern, data)][-1]

    return f"{data[start:end]},{data[end:]}"

print(separate_string(string1, sep_list))  # ↓I, ed
print(separate_string(string2, sep_list))  # ¬E, jose t.

  • We create a regex pattern by separating each keyword with |.
  • For each match in the string, we use m.span() to retrieve the start and end of the match. We only keep the last match.
  • data[start:end] is the separator, while data[end:] is everything after.

Update: This solution does not need the re module: Update #2. Shorter solution.

def run(string):
    sep_lst = ['→E', '¬E', '↓I']
    tokens = string.split()
    result = None
    for i,token in enumerate(tokens):
        if token in sep_lst:
            result = f'{tokens[i]}, {" ".join(tokens[i+1:])}'
    return result

print(run("peter →E tom ¬E luis ↓I ed"))
print(run("sigrid →E jose l. ¬E jose t."))

Output:

↓I, ed
¬E, jose t.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM