简体   繁体   中英

Java - Extract text between special character and word

I have a String that looks like this

String = "Förpackning Flaska (375 ml) Förslutning Skruvkapsyl Kr/lit (104,00) Pris 39,00 kr Antal i butik 30 st Hyllplats 04-11-01 Förpackning Flaska (750 ml) Förslutning Plastkork/syntetkork Kr/lit (100,00) Pris 75,00 kr Antal i butik 16 st Hyllplats 02-03-01";

I want to extract some of the text inside this string. The end result I want is this:

"Förpackning Flaska (375 ml) Antal i butik 30 st Förpackning Flaska (750 ml) Antal i butik 16 st".

I can use the following code:

    name = name.replace(name.substring(name.indexOf(") ") + 2, name.indexOf("Antal")), "");
    name = name.replace(name.substring(name.indexOf("st ") + 2, name.lastIndexOf("")), "");

That will give me this result:

"Förpackning Flaska (375 ml) Antal i butik 30 st"

It basically does what I want it to do, but it stops after the first occurance of the pattern.

I have tried to use a regex pattern but I can't get it to work. From observing the string, I have concluded that I need a regex pattern that matches everything between ") " and "Antal". I will also need to remove the other clutter, but that is easy. My problem is that I can't seem to get my regex to work, and that would probably be the best way to do something like this. I know that I have to escape the paranthesis to make it a literal character in my regex, but I just can't get it to work.

This is the regex I've come up with and tried:

    Pattern p = Pattern.compile("\b\\) (.+?)\bAntal");
    Matcher m = p.matcher(name);
    m.find();
    System.out.println(m.group(1));

Any help and ideas are welcome!

You are probably looking for the replaceAll method for strings in java. It has the following signature

public String replaceAll(String regex, String replacement);

This, as the name suggests, replaces all instances of the occurence of the regular expression by the replacement text.

This can be done in one line!

It looks like you want to remove:

  • the next two words after the word "st" , and
  • everything between ")" and "Antal"

Here's the code that will do that:

input.replaceAll("((?<= st)( [^ ]+){2}|(?<=\\)).*?(?= Antal))", "");

Notes regarding the regex:

  • I noticed you coded a word boundary as "\\b" . This is a mistake - you cded a literal backspace. Instead, you code it as "\\\\b"
  • I've used a regex OR expression (A|B) to match both in one regex
  • Both regexes use a look-behind to make the replacement text a blank, which is cleaner than matching part of the input you want to keep, then putting it back, and it meant I coudl combine both regexes into one OR expression
  • the ? in ".*?" is important - it means a non-greedy match. Without it, it will match the first bracket and the last Antal , skipping over any Antal between

Here's some test code:

public static void main(String[] args) {
    String input = "Förpackning Flaska (375 ml) Förslutning Skruvkapsyl Kr/lit (104,00) Pris 39,00 kr Antal i butik 30 st Hyllplats 04-11-01 Förpackning Flaska (750 ml) Förslutning Plastkork/syntetkork Kr/lit (100,00) Pris 75,00 kr Antal i butik 16 st Hyllplats 02-03-01";
    String clean = input.replaceAll("((?<= st)( [^ ]+){2}|(?<=\\)).*?(?= Antal))", "");
    System.out.println(clean);
}

Output:

Förpackning Flaska (375 ml) Antal i butik 30 st Förpackning Flaska (750 ml) Antal i butik 16 st

Try this, not sure if it will work for all of your strings, you need to approx know the max length of the description.

String s = "Förpackning Flaska (375 ml) Förslutning Skruvkapsyl Kr/lit (104,00) Pris 39,00 kr Antal i butik 30 st Hyllplats 04-11-01 Förpackning Flaska (750 ml) Förslutning Plastkork/syntetkork Kr/lit (100,00) Pris 75,00 kr Antal i butik 16 st Hyllplats 02-03-01";

String out = "";
Matcher mat = Pattern.compile("(Förpackning .{0,50}\\))|(Antal.{0,50}st)").matcher(s);
while(mat.find())
    out += mat.group()+" ";
System.out.println(out);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM