string_1 = "\tVH VH VH VL N N N N N N N\n"
Here I'm trying to split the string which has a \t
and \n
within, when I try to split the string using the split
function as below:
sep_setring = string_1.split()
Output:
['VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N']
But, I need an output to be like:
['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']
Using re.findall
:
string_1 = "\tVH VH VH VL N N N N N N N\n"
matches = re.findall(r'\S+|[^\S ]+', string_1)
print(matches)
This prints:
['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']
Here is an explanation of the regex pattern, which alternatively finds a cluster of non whitespace characters or a cluster of whitespace characters (except for space):
\S+ match one or more non whitespace characters
| OR
[^\S ]+ match one or more whitespace characters excluding space itself
You can split using lookarounds:
(?<=\t)|(?=\n)|
(?<=\t)
Assert a tab to the left |
Or(?=\n)
Assert a newline to the right |
Or
Match a spaceExample
import re
string_1 = "\tVH VH VH VL N N N N N N N\n"
sep_setring = re.split(r"(?<=\t)|(?=\n)| ", string_1)
print(sep_setring)
Output
['\t', 'VH', 'VH', 'VH', 'VL', 'N', 'N', 'N', 'N', 'N', 'N', 'N', '\n']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.