I know that there are SO MANY python regular expression questions here, I just cannot figure out my specific question, even with examples.
I have tried using regex101 but it's just not clicking.
I have these sentences:
[Hi]-THISISALOADOFTEXT-[text]
I-X-(blah[THIS2CAN2Have-SymbolsAndNumbers0])-ABCD-{x}A-AB
A-[This can 4 have any X1 rubbish in it]-ABCDDS-OH
A-F{a}R-(textnumber1)-AB-[ThisIsText123]-P-{d}C-(ThisCanHaveNumbers1)-W-[ThisIsSymbolsText123]
I just want to pull out what is between the square brackets, EXCEPT when the square brackets are enclosed by parentheses (rounded brackets).
So in the above example, it would return:
[Hi], [text]
...nothing returned for line 2...
[This can 4 have any X1 rubbish in it]
[ThisIsText123], [ThisIsSymbolsText123]
It almost works with this code:
import re
pattern = re.compile(r'(\[.*?\])')
regex = re.findall(pattern,text)
I was trying to incorporate the 'not' like this: ?!A-Za-z0-9(\\[.*?\\])
that I got from the python manual , but various attempts at this not working.
The only problem is that the above code also returns [THIS2CAN2Have-SymbolsAndNumbers0], I do not want this, as it is enclosed by parentheses.
Importantly, and where I am getting stuck, is that there can be text and numbers in between the square brackets and the rounded brackets, as in this example: (blah[THIS2CAN2Have-SymbolsAndNumbers0])
Can someone help?
As a side note, just FYI, the ultimate goal once I figure out the regex is to incorporate into a loop that says:
Edit 1: How could I extend this, so that for the sequences that have square brackets in parentheses, the full phrase in the parenthesis are returned. So for example, the input sequences:
[Hi]-THISISALOADOFTEXT-[text]
I-X-(blah[THIS2CAN2Have-SymbolsAndNumbers0])-ABCD-{x}A-AB
A-[This can 4 have any X1 rubbish in it]-ABCDDS-OH
A-F{a}R-(textnumber1)-AB-[ThisIsText123]-P-{d}C-(ThisCanHaveNumbers1)-W-[ThisIsSymbolsText123]
Would produce the output:
[Hi], [text]
(blah[THIS2CAN2Have-SymbolsAndNumbers0])
[This can 4 have any X1 rubbish in it]
[ThisIsText123], [ThisIsSymbolsText123]
in a way that i could then do different subroutines on rounded-bracket output ' (blah[THIS2CAN2Have-SymbolsAndNumbers0])'
from the other outputs, not in rounded brackets.
You may use the two following patterns:
\\[[^]]+\\](?!\\))
\\[[^]]+\\](?=\\))
As per your new requirement, you may use:
\\([^[]+\\[[^]]+\\]\\)
My answer assumes the brackets are balanced and the closing )
follows ]
.
In Python:
import re
mytext='''
[Hi]-THISISALOADOFTEXT-[text]
I-X-(blah[THIS2CAN2Have-SymbolsAndNumbers0])-ABCD-{x}A-AB
A-[This can 4 have any X1 rubbish in it]-ABCDDS-OH
A-F{a}R-(textnumber1)-AB-[ThisIsText123]-P-{d}C-(ThisCanHaveNumbers1)-W-[ThisIsSymbolsText123]
'''
print('no ():')
for i in re.findall(r'\[[^]]+\](?!\))',mytext):
print(i)
#do one routine
print('with ():')
for i in re.findall(r'\([^[]+\[[^]]+\]\)',mytext):
print (i)
#do second routine
Prints:
no ():
[Hi]
[text]
[This can 4 have any X1 rubbish in it]
[ThisIsText123]
[ThisIsSymbolsText123]
with ():
(blah[THIS2CAN2Have-SymbolsAndNumbers0])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.