简体   繁体   中英

Split string if separator is not in-between two characters

I want to write a script that reads from a csv file and splits each line by comma except any commas in-between two specific characters.

In the below code snippet I would like to split line by commas except the commas in-between two $ s.

line = "$abc,def$,$ghi$,$jkl,mno$"

output = line.split(',')

for o in output:
   print(o)

How do I write output = line.split(',') so that I get the following terminal output?

~$ python script.py
$abc,def$
$ghi$
$jkl,mno$

One solution (maybe not the most elegant but it will work) is to replace the string $,$ with something like $,,$ and then split ,, . So something like this

output = line.replace('$,$','$,,$').split(',,')

Using regex like mousetail suggested is the more elegant and robust solution but requires knowing regex (not that anyone KNOWS regex)

You can do this with a regular expression:

In re, the (?<!\$) will match a character not immediately following a $ .

Similarly, a (?!\$) will match a character not immediately before a dollar.

The | character cam match multiple options. So to match a character where either side is not a $ you can use:

expression = r"(?<!\$),|,(?!\$)"

Full program:

import re
expression = r"(?<!\$),|,(?!\$)"
print(re.split(expression, "$abc,def$,$ghi$,$jkl,mno$"))

Try regular expressions :

import re

line = "$abc,def$,$ghi$,$jkl,mno$"

output = re.findall(r"\$(.*?)\$", line)

for o in output:
    print('$'+o+'$')
$abc,def$
$ghi$
$jkl,mno$

First, you can identify a character that is not used in that line:

c = chr(max(map(ord, line)) + 1)

Then, you can proceed as follows:

line.replace('$,$', f'${c}$').split(c)

Here is your example:

>>> line = '$abc,def$,$ghi$,$jkl,mno$'
>>> c = chr(max(map(ord, line)) + 1)
>>> result = line.replace('$,$', f'${c}$').split(c)
>>> print(*result, sep='\n')
$abc,def$
$ghi$
$jkl,mno$

Solution using regex:

import re

output = re.split('(?<=\$),(?=\$)', line)

for o in output:
    print(o)

Explanation: regex expression (?<=\$),(?=\$) splits the string by commas that are between two dollar ( $ ) signs, but keeps $ signs in the parts of string after the splitting. See also Regex Lookahead and Lookbehind .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM