I have a quick question about Regex in Java (though other languages are probably similar).
What I'm trying to do is to transform a String like this:
How are you "Doing well" How well 10 "That's great"
//# I want the Regex in Java to match out all of the words, numbers,
//# and things inside quotation marks. Ideally, I'd get something like this
How
Are
You
"Doing Well"
How
Well
10
"That's Great!"
The Regex I'm trying to use is the following:
String RegexPattern = "[^"+ // START_OR: start of line OR"
"\\s" + // empty space OR
"(\\s*?<=\")]" + // ENDOR: preceeded by 0 or more spaces and a quotation mark
"(\\w+)" + // the actual word or number
"[\\s" + // START_OR: followed by a space OR
"(?=\")" + // followed by a quotation mark OR
"$]"; // ENDOF: end of line
This Won't work for me, though; even for much simpler strings! I've spent a lot of time looking for similar problems on here. If I didn't need the quotations, I could just use a split; eventually, though, this pattern will get much more complicated, so I will need to use the Regex (this is just the first iteration).
I'd appreciate any help; thanks in advance!
I don't think [ ]
means what you think it means. Inside square brackets, ^
is actually a negation operator for the character class. You should practice with smaller regexes before embarking on this task. The pattern you're looking for is more like:
\s*([^"\s]+|"[^"]*")
You can see this in action here: http://rubular.com/r/enq7eXg9Zm .
If you don't want symbols in words, then it's probably best to use a second regex that removes them, eg
\W
You can do it in multiple steps (code in python but the logic and the pattern should be the same)
1 - Get all the strings within double quotes:
r = re.findall(r'\"([^"]*)\"','How are you "Doing well" How well 10 "That\'s great"')
Result: ['Doing well', "That's great"]
2 - Remove those strings from the text:
r = re.sub(r'\"([^"]*)\"', "", 'How are you "Doing well" How well 10 "That\'s great"')
Result: 'How are you How well 10 '
3 - Now you can do your split plus the ones in double quotes from step 1.
definitively not a good/clean solution but it should work.
This should work for you. (\\"[^\\"]+\\")|([^\\s]+)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.