简体   繁体   中英

reg ex to allow any combination of numbers till the specific number python

I need to extract three patterns from the given string of numbers separated by a comma.

data=
"2,2,2,2,4,3,2,4,3, 2,2,2,2,4,3,4,3,2,4,3,4,3, 2,2,2,2,2,4,3,4,3,3,2,4,3,4,3,3 ,2,2,2,3,4,4, 2,2,2,2,4,3,2,4,3 and so on"

Given pattern to find out from the string.

2,2,2,2,4,3,2,4,3

So to extract the given pattern, I wrote a regex.

pattern one:

re.findall(r'2,2,2,2,4,3,2,4,3', data)

pattern two to build:

But in reality,4 and 3 can be in any combination and length until it meets number 2 (it can be 3,4,3,4,4 or any kind of combination until it meets the first number2). for pattern two, 2's should be the same as in the given pattern(4 2's first and one 2's in the second set of 2's).

pattern three to build:

Pattern three also needs 4's and 3's combination rule as the pattern 2 . But in addition pattern 3 can be able to extract an additional 2 2's for both 2's set. That means 4 or 5 or 6 2's in the first 2's (4+2)and 1 or 2 or 3, 2's at the seconds set of 2's(1+2).

ex:

2,2,2,2,2,3,4,3,4,4,2,4,3,4,3

2,2,2,2,2,3,4,3,4,4,2,2,4,3,4,3

2,2,2,2,3,4,3,4,4,2,2,4,3,4,3

So finally if regex 1 catches 20 matches in the data string, regex two catches 50 matches(additional 30 than regex 1) and reg ex 3 catches 70 matches(additional 20 than regex2)

Edit:

data = '2,2,2,2,4,3,2,4,3,2,2,2,2,4,3,4,3,2,4,3,4,3,2,2,2,2,2,4,3,4,3,3,2,4,3,4,3,3,2,2,2,3,4,4,2,2,2,2,4,3,2,4,3,'

data2 =  re.sub(",","", data)

# pattern 1
re.findall("2{4}43243", data2)
#['222243243', '222243243']

# pattern2
re.findall("2{4}[43]+2[43]+", data2)
#['222243243', '2222434324343', '222243433243433', '222243243']

#pattern3
re.findall("2{4,6}[43]+2{1,3}[43]+", data2)
#['222243243', '2222434324343', '2222243433243433', '222243243']

But pattern 3 missed 222243433243433 that included in pattern2. How is it possible?

Based on what I understood the question could you check if the below patterns is what you are expecting?

Note: I removed commas for simplicity

data = '2,2,2,2,4,3,2,4,3,2,2,2,2,4,3,4,3,2,4,3,4,3,2,2,2,2,2,4,3,4,3,3,2,4,3,4,3,3,2,2,2,3,4,4,2,2,2,2,4,3,2,4,3'

data2 =  re.sub(",","", data)

# pattern 1
re.findall("2{4}43243", data2)
>>>['222243243', '222243243']

# pattern 2 and 3
re.findall("2{4,6}[43]+2{1,3}[43]+", data2)
>>>['222243243', '2222434324343', '2222243433243433', '222243243']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM