简体   繁体   中英

How to split the string including the special character

string_1 = "\tVH VH VH VL N N N N N N N\n"

Here I'm trying to split the string which has a \t and \n within, when I try to split the string using the split function as below:

sep_setring = string_1.split()

Output:

['VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N']

But, I need an output to be like:

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

Using re.findall :

string_1 = "\tVH VH VH VL N N N N N N N\n"
matches = re.findall(r'\S+|[^\S ]+', string_1)
print(matches)

This prints:

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

Here is an explanation of the regex pattern, which alternatively finds a cluster of non whitespace characters or a cluster of whitespace characters (except for space):

\S+      match one or more non whitespace characters
|        OR
[^\S ]+  match one or more whitespace characters excluding space itself

You can split using lookarounds:

(?<=\t)|(?=\n)| 
  • (?<=\t) Assert a tab to the left
  • | Or
  • (?=\n) Assert a newline to the right
  • | Or
  • Match a space

Example

import re
string_1 = "\tVH VH VH VL N N N N N N N\n"
sep_setring = re.split(r"(?<=\t)|(?=\n)| ", string_1)
print(sep_setring)

Output

['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM