[英]Java - Extract text between special character and word
I have a String that looks like this 我有一个看起来像这样的字符串
String = "Förpackning Flaska (375 ml) Förslutning Skruvkapsyl Kr/lit (104,00) Pris 39,00 kr Antal i butik 30 st Hyllplats 04-11-01 Förpackning Flaska (750 ml) Förslutning Plastkork/syntetkork Kr/lit (100,00) Pris 75,00 kr Antal i butik 16 st Hyllplats 02-03-01";
I want to extract some of the text inside this string. 我想提取这个字符串中的一些文本。 The end result I want is this:
我想要的最终结果是:
"Förpackning Flaska (375 ml) Antal i butik 30 st Förpackning Flaska (750 ml) Antal i butik 16 st". “FörpackningFlaska(375 ml)Antal i butik 30stFörpackningFlaska(750 ml)Antal i butik 16 st”。
I can use the following code: 我可以使用以下代码:
name = name.replace(name.substring(name.indexOf(") ") + 2, name.indexOf("Antal")), "");
name = name.replace(name.substring(name.indexOf("st ") + 2, name.lastIndexOf("")), "");
That will give me this result: 这会给我这个结果:
"Förpackning Flaska (375 ml) Antal i butik 30 st" “FörpackningFlaska(375 ml)Antal i butik 30 st”
It basically does what I want it to do, but it stops after the first occurance of the pattern. 它基本上做了我想要它做的事情,但它在第一次出现模式后停止。
I have tried to use a regex pattern but I can't get it to work. 我试图使用正则表达式模式,但我不能让它工作。 From observing the string, I have concluded that I need a regex pattern that matches everything between ") " and "Antal".
从观察字符串开始,我得出结论,我需要一个匹配“)”和“Antal”之间所有内容的正则表达式模式。 I will also need to remove the other clutter, but that is easy.
我还需要删除其他杂乱,但这很容易。 My problem is that I can't seem to get my regex to work, and that would probably be the best way to do something like this.
我的问题是,我似乎无法让我的正则表达式工作,这可能是做这样的事情的最佳方式。 I know that I have to escape the paranthesis to make it a literal character in my regex, but I just can't get it to work.
我知道我必须逃避这个问题才能使它成为我的正则表达式中的字面字符,但我无法让它工作。
This is the regex I've come up with and tried: 这是我提出并尝试的正则表达式:
Pattern p = Pattern.compile("\b\\) (.+?)\bAntal");
Matcher m = p.matcher(name);
m.find();
System.out.println(m.group(1));
Any help and ideas are welcome! 欢迎任何帮助和想法!
You are probably looking for the replaceAll method for strings in java. 您可能正在为java中的字符串寻找replaceAll方法。 It has the following signature
它有以下签名
public String replaceAll(String regex, String replacement);
This, as the name suggests, replaces all instances of the occurence of the regular expression by the replacement text. 顾名思义,这取代了替换文本所出现的正则表达式的所有实例。
This can be done in one line! 这可以在一行中完成!
It looks like you want to remove: 看起来你想删除:
"st"
, and "st"
之后的下两个单词,和 ")"
and "Antal"
")"
和"Antal"
之间的一切 Here's the code that will do that: 这是将执行此操作的代码:
input.replaceAll("((?<= st)( [^ ]+){2}|(?<=\\)).*?(?= Antal))", "");
Notes regarding the regex: 关于正则表达式的注释:
"\\b"
. "\\b"
。 This is a mistake - you cded a literal backspace. "\\\\b"
"\\\\b"
(A|B)
to match both in one regex (A|B)
来匹配一个正则表达式 ?
?
in ".*?"
".*?"
is important - it means a non-greedy match. Antal
, skipping over any Antal
between Antal
,跳过任何Antal
之间 Here's some test code: 这是一些测试代码:
public static void main(String[] args) {
String input = "Förpackning Flaska (375 ml) Förslutning Skruvkapsyl Kr/lit (104,00) Pris 39,00 kr Antal i butik 30 st Hyllplats 04-11-01 Förpackning Flaska (750 ml) Förslutning Plastkork/syntetkork Kr/lit (100,00) Pris 75,00 kr Antal i butik 16 st Hyllplats 02-03-01";
String clean = input.replaceAll("((?<= st)( [^ ]+){2}|(?<=\\)).*?(?= Antal))", "");
System.out.println(clean);
}
Output: 输出:
Förpackning Flaska (375 ml) Antal i butik 30 st Förpackning Flaska (750 ml) Antal i butik 16 st
Try this, not sure if it will work for all of your strings, you need to approx know the max length of the description. 试试这个,不确定它是否适用于所有字符串,您需要大致知道描述的最大长度。
String s = "Förpackning Flaska (375 ml) Förslutning Skruvkapsyl Kr/lit (104,00) Pris 39,00 kr Antal i butik 30 st Hyllplats 04-11-01 Förpackning Flaska (750 ml) Förslutning Plastkork/syntetkork Kr/lit (100,00) Pris 75,00 kr Antal i butik 16 st Hyllplats 02-03-01";
String out = "";
Matcher mat = Pattern.compile("(Förpackning .{0,50}\\))|(Antal.{0,50}st)").matcher(s);
while(mat.find())
out += mat.group()+" ";
System.out.println(out);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.