簡體   English   中英

使用多個分隔符在python中拆分字符串的最佳方法 - 同時保留分隔符

[英]Best way to split a string in python with multiple separators - while keeping the separators

假設我有字符串:

string = "this is a test string <LW> I want to <NL>split this string<NL> by each tag I have inserted.<AB>"

我想通過我在前一個函數中插入字符串的每個自定義標記拆分字符串:

tags = ["<LW>", "<NL>", "<AB>"]

這是所需的輸出:

splitString = splitByTags(string, tags)

for s in splitString:
    print(s)

產量

"this is a test string <LW>"
" I want to <NL>"
"split this string<NL>"
" by each tag I have inserted.<AB>"

所以基本上我想用多個子串分割字符串,同時將這些子串保持在分割中。 這樣做最快捷,最有效的方法是什么? 我知道我可以使用string.split並簡單地將拆分文本附加到每一行但是我不確定如何使用多個字符串執行此操作。

使用re.split捕獲括號。

例如:

import re
string = "this is a test string <LW> I want to <NL>split this string<NL> by each tag I have inserted.<AB>"
tags = ["<LW>", "<NL>", "<AB>"]

splt_str = re.split("(" + "|".join(tags) + ")", string)

for i in range(0, len(splt_str), 2):
    print("".join(splt_str[i:i+2]))

輸出:

this is a test string <LW>
 I want to <NL>
split this string<NL>
 by each tag I have inserted.<AB>

這里有一些例子如何做到這一點:

import re

def split_string(string, tags):
    string_list = []
    start = 0
    for tag in tags:
        tag_index = re.finditer(tag, string)
        for item in tag_index:
            end_tag = item.start() + len(tag)
            string_list.append(string[start:end_tag])
            start = end_tag

    return string_list



data = split_string(string, tags)

輸出:

['this is a test string <LW>', ' I want to <NL>', 'split this string<NL>', ' by each tag I have inserted.<AB>']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM