简体   繁体   English

使用分隔符拆分带引号的字符串

[英]Split a quoted string with a delimiter

I want to split a string with a delimiter white space. 我想拆分带分隔符空格的字符串。 but it should handle quoted strings intelligently. 但它应该智能地处理引用的字符串。 Eg for a string like 例如,像一个字符串

"John Smith" Ted Barry 

It should return three strings John Smith, Ted and Barry. 它应该返回三个字符串John Smith,Ted和Barry。

After messing around with it, you can use Regex for this. 搞乱之后,你可以使用正则表达式。 Run the equivalent of "match all" on: 运行相当于“匹配所有”的:

((?<=("))[\w ]*(?=("(\s|$))))|((?<!")\w+(?!"))

A Java Example: 一个Java示例:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Test
{ 
    public static void main(String[] args)
    {
        String someString = "\"Multiple quote test\" not in quotes \"inside quote\" \"A work in progress\"";
        Pattern p = Pattern.compile("((?<=(\"))[\\w ]*(?=(\"(\\s|$))))|((?<!\")\\w+(?!\"))");
        Matcher m = p.matcher(someString);

        while(m.find()) {
            System.out.println("'" + m.group() + "'");
        }
    }
}

Output: 输出:

'Multiple quote test'
'not'
'in'
'quotes'
'inside quote'
'A work in progress'

The regular expression breakdown with the example used above can be viewed here: 可以在此处查看使用上述示例的正则表达式细分:

http://regex101.com/r/wM6yT9 http://regex101.com/r/wM6yT9


With all that said, regular expressions should not be the go to solution for everything - I was just having fun. 尽管如此,正则表达式不应该是解决所有问题的方法 - 我只是玩得开心。 This example has a lot of edge cases such as the handling unicode characters, symbols, etc. You would be better off using a tried and true library for this sort of task. 这个例子有许多边缘情况,例如处理unicode字符,符号等。你最好使用一个久经考验的库来完成这类任务。 Take a look at the other answers before using this one. 在使用此答案之前,请先查看其他答案。

Try this ugly bit of code. 试试这个丑陋的代码。

    String str = "hello my dear \"John Smith\" where is Ted Barry";
    List<String> list = Arrays.asList(str.split("\\s"));
    List<String> resultList = new ArrayList<String>();
    StringBuilder builder = new StringBuilder();
    for(String s : list){
        if(s.startsWith("\"")) {
            builder.append(s.substring(1)).append(" ");
        } else {
            resultList.add((s.endsWith("\"") 
                    ? builder.append(s.substring(0, s.length() - 1)) 
                    : builder.append(s)).toString());
            builder.delete(0, builder.length());
        }
    }
    System.out.println(resultList);     

well, i made a small snipet that does what you want and some more things. 好吧,我做了一个小狙击手,做你想要的东西和更多的东西。 since you did not specify more conditions i did not go through the trouble. 因为你没有指定更多的条件我没有经历麻烦。 i know this is a dirty way and you can probably get better results with something that is already made. 我知道这是一种肮脏的方式,你可以通过已经制作的东西获得更好的结果。 but for the fun of programming here is the example: 但是为了编程的乐趣,这里是一个例子:

    String example = "hello\"John Smith\" Ted Barry lol\"Basi German\"hello";
    int wordQuoteStartIndex=0;
    int wordQuoteEndIndex=0;

    int wordSpaceStartIndex = 0;
    int wordSpaceEndIndex = 0;

    boolean foundQuote = false;
    for(int index=0;index<example.length();index++) {
        if(example.charAt(index)=='\"') {
            if(foundQuote==true) {
                wordQuoteEndIndex=index+1;
                //Print the quoted word
                System.out.println(example.substring(wordQuoteStartIndex, wordQuoteEndIndex));//here you can remove quotes by changing to (wordQuoteStartIndex+1, wordQuoteEndIndex-1)
                foundQuote=false;
                if(index+1<example.length()) {
                    wordSpaceStartIndex = index+1;
                }
            }else {
                wordSpaceEndIndex=index;
                if(wordSpaceStartIndex!=wordSpaceEndIndex) {
                    //print the word in spaces
                    System.out.println(example.substring(wordSpaceStartIndex, wordSpaceEndIndex));
                }
                wordQuoteStartIndex=index;
                foundQuote = true;
            }
        }

        if(foundQuote==false) {
            if(example.charAt(index)==' ') {
                wordSpaceEndIndex = index;
                if(wordSpaceStartIndex!=wordSpaceEndIndex) {
                    //print the word in spaces
                    System.out.println(example.substring(wordSpaceStartIndex, wordSpaceEndIndex));
                }
                wordSpaceStartIndex = index+1;
            }

            if(index==example.length()-1) {
                if(example.charAt(index)!='\"') {
                    //print the word in spaces
                    System.out.println(example.substring(wordSpaceStartIndex, example.length()));
                }
            }
        }
    }

this also checks for words that were not separated with a space after or before the quotes, such as the words "hello" before "John Smith" and after "Basi German". 这也检查在引号之前或之前没有用空格分隔的单词,例如“John Smith”之前和“Basi German”之后的单词“hello”。

when the string is modified to "John Smith" Ted Barry the output is three strings, 1) "John Smith" 2) Ted 3) Barry 当字符串被修改为"John Smith" Ted Barry ,输出是三个字符串,1)“John Smith”2)Ted 3)Barry

The string in the example is hello"John Smith" Ted Barry lol"Basi German"hello and prints 1)hello 2)"John Smith" 3)Ted 4)Barry 5)lol 6)"Basi German" 7)hello 示例中的字符串是你好“John Smith”Ted Barry lol“Basi German”你好并打印1)你好2)“John Smith”3)Ted 4)Barry 5)lol 6)“Basi German”7)你好

Hope it helps 希望能帮助到你

commons-lang has a StrTokenizer class to do this for you, and there is also java-csv library. commons-lang有一个StrTokenizer类来为你做这个,还有java-csv库。

Example with StrTokenizer: StrTokenizer的示例:

String params = "\"John Smith\" Ted Barry"
// Initialize tokenizer with input string, delimiter character, quote character
StrTokenizer tokenizer = new StrTokenizer(params, ' ', '"');
for (String token : tokenizer.getTokenArray()) {
   System.out.println(token);
}

Output: 输出:

John Smith
Ted
Barry

This is my own version, clean up from http://pastebin.com/aZngu65y (posted in the comment). 这是我自己的版本,从http://pastebin.com/aZngu65y (发表在评论中)清理。 It can take care of Unicode. 它可以照顾Unicode。 It will clean up all excessive spaces (even in quote) - this can be good or bad depending on the need. 它将清理所有过多的空间(即使在引用中) - 根据需要,这可能是好的还是坏的。 No support for escaped quote. 不支持转义报价。

private static String[] parse(String param) {
  String[] output;

  param = param.replaceAll("\"", " \" ").trim();
  String[] fragments = param.split("\\s+");

  int curr = 0;
  boolean matched = fragments[curr].matches("[^\"]*");
  if (matched) curr++;

  for (int i = 1; i < fragments.length; i++) {
    if (!matched)
      fragments[curr] = fragments[curr] + " " + fragments[i];

    if (!fragments[curr].matches("(\"[^\"]*\"|[^\"]*)"))
      matched = false;
    else {
      matched = true;

      if (fragments[curr].matches("\"[^\"]*\""))
        fragments[curr] = fragments[curr].substring(1, fragments[curr].length() - 1).trim();

      if (fragments[curr].length() != 0)
        curr++;

      if (i + 1 < fragments.length)
        fragments[curr] = fragments[i + 1];
    }
  }

  if (matched) { 
    return Arrays.copyOf(fragments, curr);
  }

  return null; // Parameter failure (double-quotes do not match up properly).
}

Sample input for comparison: 用于比较的样本输入:

"sdfskjf" sdfjkhsd "hfrif ehref" "fksdfj sdkfj fkdsjf" sdf sfssd


asjdhj    sdf ffhj "fdsf   fsdjh"
日本語 中文 "Tiếng Việt" "English"
    dsfsd    
   sdf     " s dfs    fsd f   "  sd f   fs df  fdssf  "日本語 中文"
""   ""     ""
"   sdfsfds "   "f fsdf

(2nd line is empty, 3rd line is spaces, last line is malformed). (第2行为空,第3行为空格,最后一行为格式错误)。 Please judge with your own expected output, since it may varies, but the baseline is that, the 1st case should return [sdfskjf, sdfjkhsd, hfrif ehref, fksdfj sdkfj fkdsjf, sdf, sfssd]. 请根据您自己的预期输出判断,因为它可能会有所不同,但基线是这样,第一种情况应该返回[sdfskjf,sdfjkhsd,hfrif ehref,fksdfj sdkfj fkdsjf,sdf,sfssd]。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM