简体   繁体   中英

Python, RegEx, Replace a certain part of a match

I am trying to replace a certain part of a match that a regex found. The relevant strings have the following format:

"<Random text>[Text1;Text2;....;TextN]<Random text>"

So basically there can be N Texts seperated by a ";" inside the brackets. My goal is to change the ";" into a "," (but only for the strings which are in this format) so that I can keep the ";" as a seperator for a CSV file. So the result should be:

"<Random text>[Text1,Text2,...,TextN]<Random text>"

I can match the relevant strings with something like

re.compile(r'\[".*?((;).*?){1,4}"\]')

but if I try to use the sub method it replaces the whole string.

I have searched stackoverflow and I am pretty sure that "capture groups" might be the solution but I am not really getting there. Can anyone help me?

I ONLY want to change the ";" in the ["Text1;...;TextN"]-parts of my text file.

Try this regex:

;(?=(?:(?!\[).)*])

Replace each match with a ,

Click for Demo

Explanation:

  • ; - matches a ;
  • (?=(?:(?.\[).)*]) - makes sure that the above ; is followed by a closing ] somewhere later in the string but before any opening bracket [
    • (?=....) - positive lookahead
    • (?:(?.\[).)* - 0+ occurrences of any character which does not start with [
    • ] - matches a ]

If you want to match a ; before a closing ] and not matching [ in between you could use:

;(?=[^[]*])
  • ; Match literally
  • (?= Positive lookahead, assert what is on the right is
    • [^[]* Negated character class, match 0+ times any char except [
  • ] Match literally
  • ) Close lookahead

Regex demo

Note that this will also match if there is no leading [


If you also want to make sure that there is a leading [ you could make use of the PyPi regex module and use \G and \K to match a single ;

(?:\[(?=[^[\]]*])|\G(?!^))[^;[\]]*\K;

Regex demo | Python demo

import regex

pattern = r"(?:\[(?=[^[\]]*])|\G(?!^))[^;[\]]*\K;"
test_str = ("[\"Text1;Text2;....;TextN\"];asjkdjksd;ajksdjksad[\"Text1;Text2;....;TextN\"]\n\n"
    ".[\"Text1;Text2\"]...long text...[\"Text1;Text2;Text3\"]....long text...[\"Text1;...;TextN\"]...long text...\n\n"
    "I ONLY want to change the \";\" in the [\"Text1;...;TextN\"]")

result = regex.sub(pattern, ",", test_str)
print (result)

Output

["Text1,Text2,....,TextN"];asjkdjksd;ajksdjksad["Text1,Text2,....,TextN"]

.["Text1,Text2"]...long text...["Text1,Text2,Text3"]....long text...["Text1,...,TextN"]...long text...

I ONLY want to change the ";" in the ["Text1,...,TextN"]

You can try this code sample:

import re
x = 'anbhb["Text1;Text2;...;TextN"]nbgbyhuyg["Text1;Text2;...;TextN"][]nhj,kji,'
for i in range(len(x)):
    if x[i] == '[' and x[i + 1] == '"':
        while x[i+2] != '"':
            list1 = list(x)
            if x[i] == ';':
                list1[i] = ','
                x = ''.join(list1)

            i = i + 1

print(x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM