How to split the string including the special character

Question

string_1 = "\tVH VH VH VL N N N N N N N\n"

Here I'm trying to split the string which has a \t and \n within, when I try to split the string using the split function as below:

sep_setring = string_1.split()

Output:

['VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N']

But, I need an output to be like:

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

Answer 1

Using re.findall :

string_1 = "\tVH VH VH VL N N N N N N N\n"
matches = re.findall(r'\S+|[^\S ]+', string_1)
print(matches)

This prints:

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

Here is an explanation of the regex pattern, which alternatively finds a cluster of non whitespace characters or a cluster of whitespace characters (except for space):

\S+      match one or more non whitespace characters
|        OR
[^\S ]+  match one or more whitespace characters excluding space itself

Answer 2

You can split using lookarounds:

(?<=\t)|(?=\n)|

(?<=\t) Assert a tab to the left
| Or
(?=\n) Assert a newline to the right
| Or
Match a space

Example

import re
string_1 = "\tVH VH VH VL N N N N N N N\n"
sep_setring = re.split(r"(?<=\t)|(?=\n)| ", string_1)
print(sep_setring)

Output

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

How to split the string including the special character

Question

2 answers

solution1
3 ACCPTED 2021-04-13 06:42:09

solution2
1 2021-04-13 06:46:01

How to split the string including the special character

Question

2 answers

solution1 3 ACCPTED 2021-04-13 06:42:09

solution2 1 2021-04-13 06:46:01

solution1
3 ACCPTED 2021-04-13 06:42:09

solution2
1 2021-04-13 06:46:01