I have a long string of data which looks like:
dstgfsda12345.123gsrsvrvsdfcsd23456.234tsrsd
Notice that the '12345.123' pattern is the same. I want to split the string on it using python (so s.split(<regex>)
).
What would be the appropriate regex?
'[0-9]{5}.[0-9]{3}'
does not work; I presume it expects whitespace around it(?).
Just escape .
, and you are done:
\d{5}\.\d{3}
You can use Regex token \\d
as a shorthand for [0-9]
.
Example:
>>> re.split(r'\d{5}\.\d{3}', 'dstgfsda12345.123gsrsvrvsdfcsd23456.234tsrsd')
['dstgfsda', 'gsrsvrvsdfcsd', 'tsrsd']
I don't understand exactly what's your actual need but seems that you want your regex to isolate each occurrence of 5 digits, dot, 3 digits.
So instead of '[0-9]{5}.[0-9]{3}'
you must use '[0-9]{5}\\.[0-9]{3}'
, because .
matches any character, while \\.
matches only a dot.
Your regex should be '\\d{5}\\.\\d{3}'
.
Check the usage of .
instead of \\.
. That is because, '.' (Dot.) in the default mode, matches any character except a newline. Refer regex document. Whereas \\s
means dot
in your string.
For example:
import re
my_string = 'dstgfsda12345.123gsrsvrvsdfcsd23456.234tsrsd'
my_regex = '\d{5}\.\d{3}'
re.split(my_regex, my_string)
# returns: ['dstgfsda', 'gsrsvrvsdfcsd', 'tsrsd']
Explanation on how '\\d{5}\\.\\d{3}'
works:
\\d
means any digit between 0-9
. \\d{5}
sub-string with any 5 consecutive digits. \\.
means digits followed by single .
. At last \\d{3}
means any 3 digits after .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.