简体   繁体   中英

Regex-ing Words, Numbers, And Quotations from a string in Java

I have a quick question about Regex in Java (though other languages are probably similar).

What I'm trying to do is to transform a String like this:

 How are you "Doing well" How well 10 "That's great"

//# I want the Regex in Java to match out all of the words, numbers, 
//# and things inside quotation marks. Ideally, I'd get something like this 

How
Are
You
"Doing Well"
How 
Well
10
"That's Great!"

The Regex I'm trying to use is the following:

String RegexPattern =   "[^"+           //  START_OR: start of line OR" 
                        "\\s" +         //  empty space OR
                        "(\\s*?<=\")]" + // ENDOR: preceeded by 0 or more spaces and a quotation mark 
                        "(\\w+)" +      // the actual word or number
                        "[\\s" +        // START_OR: followed by a space OR
                        "(?=\")" +      // followed by a quotation mark OR
                        "$]";           // ENDOF:  end of line

This Won't work for me, though; even for much simpler strings! I've spent a lot of time looking for similar problems on here. If I didn't need the quotations, I could just use a split; eventually, though, this pattern will get much more complicated, so I will need to use the Regex (this is just the first iteration).

I'd appreciate any help; thanks in advance!

I don't think [ ] means what you think it means. Inside square brackets, ^ is actually a negation operator for the character class. You should practice with smaller regexes before embarking on this task. The pattern you're looking for is more like:

    \s*([^"\s]+|"[^"]*")

You can see this in action here: http://rubular.com/r/enq7eXg9Zm .

If you don't want symbols in words, then it's probably best to use a second regex that removes them, eg

    \W

You can do it in multiple steps (code in python but the logic and the pattern should be the same)

1 - Get all the strings within double quotes:

r = re.findall(r'\"([^"]*)\"','How are you "Doing well" How well 10 "That\'s great"')

Result: ['Doing well', "That's great"]

2 - Remove those strings from the text:

r = re.sub(r'\"([^"]*)\"', "", 'How are you "Doing well" How well 10 "That\'s great"')

Result: 'How are you How well 10 '

3 - Now you can do your split plus the ones in double quotes from step 1.

definitively not a good/clean solution but it should work.

This should work for you. (\\"[^\\"]+\\")|([^\\s]+)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM