简体   繁体   English

Java - 在特殊字符和单词之间提取文本

[英]Java - Extract text between special character and word

I have a String that looks like this 我有一个看起来像这样的字符串

String = "Förpackning Flaska (375 ml) Förslutning Skruvkapsyl Kr/lit (104,00) Pris 39,00 kr Antal i butik 30 st Hyllplats 04-11-01 Förpackning Flaska (750 ml) Förslutning Plastkork/syntetkork Kr/lit (100,00) Pris 75,00 kr Antal i butik 16 st Hyllplats 02-03-01";

I want to extract some of the text inside this string. 我想提取这个字符串中的一些文本。 The end result I want is this: 我想要的最终结果是:

"Förpackning Flaska (375 ml) Antal i butik 30 st Förpackning Flaska (750 ml) Antal i butik 16 st". “FörpackningFlaska(375 ml)Antal i butik 30stFörpackningFlaska(750 ml)Antal i butik 16 st”。

I can use the following code: 我可以使用以下代码:

    name = name.replace(name.substring(name.indexOf(") ") + 2, name.indexOf("Antal")), "");
    name = name.replace(name.substring(name.indexOf("st ") + 2, name.lastIndexOf("")), "");

That will give me this result: 这会给我这个结果:

"Förpackning Flaska (375 ml) Antal i butik 30 st" “FörpackningFlaska(375 ml)Antal i butik 30 st”

It basically does what I want it to do, but it stops after the first occurance of the pattern. 它基本上做了我想要它做的事情,但它在第一次出现模式后停止。

I have tried to use a regex pattern but I can't get it to work. 我试图使用正则表达式模式,但我不能让它工作。 From observing the string, I have concluded that I need a regex pattern that matches everything between ") " and "Antal". 从观察字符串开始,我得出结论,我需要一个匹配“)”和“Antal”之间所有内容的正则表达式模式。 I will also need to remove the other clutter, but that is easy. 我还需要删除其他杂乱,但这很容易。 My problem is that I can't seem to get my regex to work, and that would probably be the best way to do something like this. 我的问题是,我似乎无法让我的正则表达式工作,这可能是做这样的事情的最佳方式。 I know that I have to escape the paranthesis to make it a literal character in my regex, but I just can't get it to work. 我知道我必须逃避这个问题才能使它成为我的正则表达式中的字面字符,但我无法让它工作。

This is the regex I've come up with and tried: 这是我提出并尝试的正则表达式:

    Pattern p = Pattern.compile("\b\\) (.+?)\bAntal");
    Matcher m = p.matcher(name);
    m.find();
    System.out.println(m.group(1));

Any help and ideas are welcome! 欢迎任何帮助和想法!

You are probably looking for the replaceAll method for strings in java. 您可能正在为java中的字符串寻找replaceAll方法。 It has the following signature 它有以下签名

public String replaceAll(String regex, String replacement);

This, as the name suggests, replaces all instances of the occurence of the regular expression by the replacement text. 顾名思义,这取代了替换文本所出现的正则表达式的所有实例。

This can be done in one line! 这可以在一行中完成!

It looks like you want to remove: 看起来你想删除:

  • the next two words after the word "st" , and "st"之后的下两个单词,和
  • everything between ")" and "Antal" ")""Antal"之间的一切

Here's the code that will do that: 这是将执行此操作的代码:

input.replaceAll("((?<= st)( [^ ]+){2}|(?<=\\)).*?(?= Antal))", "");

Notes regarding the regex: 关于正则表达式的注释:

  • I noticed you coded a word boundary as "\\b" . 我注意到你将单词边界编码为"\\b" This is a mistake - you cded a literal backspace. 这是一个错误 - 你提到了一个字面退格。 Instead, you code it as "\\\\b" 相反,您将其编码为"\\\\b"
  • I've used a regex OR expression (A|B) to match both in one regex 我使用正则表达式OR表达式(A|B)来匹配一个正则表达式
  • Both regexes use a look-behind to make the replacement text a blank, which is cleaner than matching part of the input you want to keep, then putting it back, and it meant I coudl combine both regexes into one OR expression 两个正则表达式都使用后视来使替换文本变为空白,这比匹配您想要保留的部分输入更清晰,然后将其放回去,这意味着我将两个正则表达式组合成一个OR表达式
  • the ? ? in ".*?" ".*?" is important - it means a non-greedy match. 很重要 - 这意味着非贪婪的比赛。 Without it, it will match the first bracket and the last Antal , skipping over any Antal between 没有它,它将匹配第一个支架和最后一个Antal ,跳过任何Antal之间

Here's some test code: 这是一些测试代码:

public static void main(String[] args) {
    String input = "Förpackning Flaska (375 ml) Förslutning Skruvkapsyl Kr/lit (104,00) Pris 39,00 kr Antal i butik 30 st Hyllplats 04-11-01 Förpackning Flaska (750 ml) Förslutning Plastkork/syntetkork Kr/lit (100,00) Pris 75,00 kr Antal i butik 16 st Hyllplats 02-03-01";
    String clean = input.replaceAll("((?<= st)( [^ ]+){2}|(?<=\\)).*?(?= Antal))", "");
    System.out.println(clean);
}

Output: 输出:

Förpackning Flaska (375 ml) Antal i butik 30 st Förpackning Flaska (750 ml) Antal i butik 16 st

Try this, not sure if it will work for all of your strings, you need to approx know the max length of the description. 试试这个,不确定它是否适用于所有字符串,您需要大致知道描述的最大长度。

String s = "Förpackning Flaska (375 ml) Förslutning Skruvkapsyl Kr/lit (104,00) Pris 39,00 kr Antal i butik 30 st Hyllplats 04-11-01 Förpackning Flaska (750 ml) Förslutning Plastkork/syntetkork Kr/lit (100,00) Pris 75,00 kr Antal i butik 16 st Hyllplats 02-03-01";

String out = "";
Matcher mat = Pattern.compile("(Förpackning .{0,50}\\))|(Antal.{0,50}st)").matcher(s);
while(mat.find())
    out += mat.group()+" ";
System.out.println(out);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM