简体   繁体   中英

Extract specific data from string with regex

I want to capture multiple string which match some specific patterns, For example my string is like

String textData = "#1_Label for UK#2_Label for US#4_Label for FR#";

I want to get string between two # which match with string like for UK

Output should like this if match string is UK than
output should be 1_Label for UK

if match string is label than
output should be 1_Label for UK, 2_Label for US and 4_Label for FR if match string is 1_ than

output should be 1_Label for UK

I don't want to extract data via array list and extraction should be case insensitive.

Can you please help me out from this problem?

Regards, Ashish Mishra

You can use this regex for search:

#([^#]*?Label[^#]*)(?=#)

Replace Label with your search keyword.

RegEx Demo

Java Pattern:

Pattern p = Pattern.compile( "#([^#]*?" + Pattern.quote(keyword) + "[^#]*)(?=#)" );

If the data always is between two hashes, try a regex like this: (?i)#.*your_match.*# where your_match would be UK , label , 1_ etc.

Then use this expression in conjunction with the Pattern and Matcher classes.

If you want to match multiple strings, you'd need to exclude the hashes from the match by using look-around methods as well as reluctant modifiers, eg (?i)(?<=#).*?label.*?(?=#) .

Short breakdown:

  • (?i) will make the expression case insensitive
  • (?<=#) is a positive look-behind, ie the match must be preceeded by a hash (but doesn't include the hash)
  • .*? matches any sequence of characters but is reluctant, ie it tries to match as few characters as possible
  • (?=#) is a positive look-ahead, which means the match must be followed by a hash (also not included in the match)

Without the look-around methods the hashes would be included in the match and thus using Matcher.find() you'd skip every other label in your test string, ie you'd get the matches #1_Label for UK# and #4_Label for FR# but not #2_Label for US# .

Without the relucatant modifiers the expression would match everything between the first and the last hash.

As an alternative and better, replace .*? with [^#]* , which would mean that the match cannot contain any hash, thus removing the need for reluctant modifiers as well as removing the problem that looking for US would match 1_Label for UK#2_Label for US .

So most probably the final regex you're after looks like this: (?i)(?<=#)[^#]*your_match[^#]*(?=#) .

([^#]*UK[^#]*)   for UK

([^#]*Label[^#]*) for Label

([^#]*1_[^#]*)    for 1_

Try this.Grab the captures.See demo.

http://regex101.com/r/kQ0zR5/3

http://regex101.com/r/kQ0zR5/4

http://regex101.com/r/kQ0zR5/5

I have solved this problem with below pattern,

(?i)([^#]*?us[^#]*)(?=#)

Thank you so much Anubhava, VKS and Thomas for you reply.

Regards,
Ashish Mishra

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM