I have a string which is:
str2s = 'orange,juices,apple,apple[-2]'
I'm trying to extract all those words as long as the bracket out, using regex, not to use str.split(), so I want:
'orange', 'juices', 'apple', 'apple[-2]'
I tried using:
re.findall(
'[A-Za-z][A-Za-z0-9_%\\.]{0,}\[?[a-zA-Z0-9_]*\]?',
str2s,
flags=re.IGNORECASE
)
But it only returned:
'orange', 'juices', 'apple', 'apple['
How to get the -2]
as well?
You can start with match with a char a-zA-Z, then match optional word characters, and optionally match from an opening till closing square bracket.
\b[A-Z]\w*(?:\[[^][]*\])?
Explanation
\b
A word boundary to prevent a partial word match [AZ]\w*
Match a char a-zA-Z followed by optional word characters (?:
Non capture group
\[[^][]*\]
Match [...]
using a negated character class )?
Close the non capture group and make it optionalSee a regex demo and a Python demo .
import re
str2s = 'orange,juices,apple,apple[-2]'
print(re.findall(r'\b[A-Z]\w*(?:\[[^][]*\])?', str2s, flags=re.I))
Output
['orange', 'juices', 'apple', 'apple[-2]']
In order to split an string to a list I think you have to know the exact separator, or being able to identify those separators, being it ,
or [,.]
or others.
If you can't define separators from items, I think it will be very hard to achieve your goal via common methods.
With that being said, in your case of orange,juices,apple,apple[-2]
, you may use r'([\w\[\]\-]+)'
https://regex101.com/r/2Xj7AR/1
The following code will extract the words the way you want:
import re
words = re.compile(r'\w+\[?-?\d*\]?', re.IGNORECASE)
s = 'orange,juices,apple,apple[-2],pineapple[20]'
words.findall(s)
Which will result in the following:
['orange', 'juices', 'apple', 'apple[-2]', 'pineapple[20]']
Bear in mind that the snippet above was written with the example string you provided as the base. If you need to match other types of words ( 007%abc
, for example), you will need to adjust the regular expression to match more characters.
SELECT RESULT, C1 ||','|| C2 ||','|| C3 || ','|| C4 AS R_W_APOSTROPHE
FROM (
SELECT RESULT,
REGEXP_REPLACE(REGEXP_SUBSTR(REGEXP_SUBSTR(RESULT,'[^,]+',1,1),'[^09+,].*'),'^|$','''') AS C1,
REGEXP_REPLACE(REGEXP_SUBSTR(REGEXP_SUBSTR(RESULT,'[^,]+',1,2),'[^09+,].*'),'^|$','''') AS C2,
REGEXP_REPLACE(REGEXP_SUBSTR(REGEXP_SUBSTR(RESULT,'[^,]+',1,3),'[^09+,].*'),'^|$','''') AS C3,
REGEXP_REPLACE(REGEXP_SUBSTR(REGEXP_SUBSTR(RESULT,'[^,]+',1,4),'[^09+,].*'),'^|$','''') AS C4
from (
select 'orange, juices, apple, apple[-2]' as RESULT from dual))
This code is written on PL/SQL but you can use the algorithm. Maybe can make things on your mind.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.