简体   繁体   English

根据位置提取双引号之间的单词

[英]Extract words between double quotes based on position

I have a single string that contains several quotes, ie: 我有一个包含几个报价,即一个字符串:

"Bruce Wayne" "43" "male" "Gotham" “布鲁斯·韦恩”“ 43”“男性”“哥谭”

I want to create a method using regex that extracts certain values from the String based on their position. 我想使用正则表达式创建一种方法,该方法根据字符串的位置从字符串中提取某些值。

So for example, if I pass the Int values 1 and 3 it should return a String of: "Bruce Wayne" "male" 因此,例如,如果我传递Int值1和3,则应返回以下字符串:“ Bruce Wayne”“ male”

Please note the double quotes are part of the String and are escaped characters (\\") 请注意,双引号是字符串的一部分,并且是转义字符(\\“)

If the number of (possible) groups is known you could use a regular expression like "(.*?)"\\s*"(.*?)"\\s*"(.*?)"\\s*"(.*?)" along with Pattern and Matcher and access the groups by number (group 0 will always be the first match, group 1 will be the first capturing group in the expression and so on). 如果已知(可能)组的数量,则可以使用正则表达式,例如"(.*?)"\\s*"(.*?)"\\s*"(.*?)"\\s*"(.*?)"以及PatternMatcher然后按数字访问组(组0始终是第一个匹配项,组1将是表达式中的第一个捕获组,依此类推)。

If the number of groups is not known you could just use expression "(.*?)" and use Matcher#find() too apply the expression in a loop and collect all the matches (group 0 in that case) into a list. 如果不知道组数,则可以使用表达式"(.*?)"并使用Matcher#find()将该表达式套用在循环中,并将所有匹配项(在这种情况下为组0)收集到列表中。 Then use your indices to access the list element (element 1 would be at index 0 then). 然后,使用索引访问列表元素(元素1将位于索引0处)。

Another alternative would be to use string.replaceAll("^[^\\"]*\\"|\\"[^\\"]*$","").split("\\"\\\\s*\\"") , ie remove the leading and trailing double quotes with any text before or after and then split on quotes with optional whitespace in between. 另一种选择是使用string.replaceAll("^[^\\"]*\\"|\\"[^\\"]*$","").split("\\"\\\\s*\\"") ,例如,删除前后带有任何文本的前导和尾随双引号,然后在引号之间进行分割,中间使用可选的空格。

Example: 例:

  • assume the string optional crap before "Bruce Wayne" "43" "male" "Gotham" optional crap after 假定optional crap before "Bruce Wayne" "43" "male" "Gotham" optional crap after的字符串optional crap before "Bruce Wayne" "43" "male" "Gotham" optional crap after
  • string.replaceAll("^[^\\"]*\\"|\\"[^\\"]*$","") will result in Bruce Wayne" "43" "male" "Gotham string.replaceAll("^[^\\"]*\\"|\\"[^\\"]*$","")将导致Bruce Wayne" "43" "male" "Gotham
  • applying split("\\"\\\\s*\\"") on the result of the step before will yield the array [Bruce Wayne, 43, male, Gotham] 在之前的步骤结果上应用split("\\"\\\\s*\\"")将产生数组[Bruce Wayne, 43, male, Gotham]
  • then just access the array elements by index (zero-based) 然后只按索引访问数组元素(从零开始)

My function starts at 0. You said that you want 1 and 3 but usually you start at 0 when working with arrays. 我的函数从0开始。您说过要1和3,但通常在处理数组时从0开始。 So to get "Bruce Wayne" you'd ask for 0 not 1 . 因此,要获得“布鲁斯·韦恩”,您需要0而不是1 (you could change that if you'd like though) (如果愿意,您可以更改此设置)

String[] getParts(String text, int... positions) {
    String results[] = new String[positions.length];

    Matcher m = Pattern.compile("\"[^\"]*\"").matcher(text);

    for(int i = 0, j = 0; m.find() && j < positions.length; i++) {
        if(i != positions[j]) continue;
        results[j] = m.group();
        j++;
    }

    return results;
}

// Usage
public Test() {

     String[] parts = getParts(" \"Bruce Wayne\" \"43\" \"male\" \"Gotham\" ", 0, 2);
     System.out.println(Arrays.toString(parts));
     // = ["Bruce Wayne", "male"]

}

The method accepts as many parameters as you like. 该方法接受任意数量的参数。

getParts(" \"a\" \"b\" \"c\" \"d\" ", 0, 2, 3); // = a, c, d
// or 
getParts(" \"a\" \"b\" \"c\" \"d\" ", 3); // = d

The function to extract words based on position: 根据位置提取单词的功能:

import java.util.ArrayList;
import java.util.regex.*;

public String getString(String input, int i, int j){
    ArrayList <String> list = new ArrayList <String> ();
    Matcher m = Pattern.compile("(\"[^\"]+\")").matcher(input);
    while (m.find()) {
        list.add(m.group(1));
    }
    return list.get(i - 1) + list.get(j - 1);
}

Then the words can be extracted like: 然后可以像这样提取单词:

String input = "\"Bruce Wayne\" \"43\" \"male\" \"Gotham\"";
String res = getString(input, 1, 3);
System.out.println(res);

Output: 输出:

"Bruce Wayne""male"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM