简体   繁体   中英

Regex pattern within text

I have a long string of data which looks like:

dstgfsda12345.123gsrsvrvsdfcsd23456.234tsrsd

Notice that the '12345.123' pattern is the same. I want to split the string on it using python (so s.split(<regex>) ).

What would be the appropriate regex?

'[0-9]{5}.[0-9]{3}'

does not work; I presume it expects whitespace around it(?).

Just escape . , and you are done:

\d{5}\.\d{3}

You can use Regex token \\d as a shorthand for [0-9] .

Example:

>>> re.split(r'\d{5}\.\d{3}', 'dstgfsda12345.123gsrsvrvsdfcsd23456.234tsrsd')
['dstgfsda', 'gsrsvrvsdfcsd', 'tsrsd']

I don't understand exactly what's your actual need but seems that you want your regex to isolate each occurrence of 5 digits, dot, 3 digits.

So instead of '[0-9]{5}.[0-9]{3}' you must use '[0-9]{5}\\.[0-9]{3}' , because . matches any character, while \\. matches only a dot.

Your regex should be '\\d{5}\\.\\d{3}' .

Check the usage of . instead of \\. . That is because, '.' (Dot.) in the default mode, matches any character except a newline. Refer regex document. Whereas \\s means dot in your string.

For example:

import re
my_string = 'dstgfsda12345.123gsrsvrvsdfcsd23456.234tsrsd'
my_regex = '\d{5}\.\d{3}'
re.split(my_regex, my_string)
# returns: ['dstgfsda', 'gsrsvrvsdfcsd', 'tsrsd']

Explanation on how '\\d{5}\\.\\d{3}' works:

\\d means any digit between 0-9 . \\d{5} sub-string with any 5 consecutive digits. \\. means digits followed by single . . At last \\d{3} means any 3 digits after .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM