使用分隔符拆分帶引號的字符串

Question

我想拆分帶分隔符空格的字符串。 但它應該智能地處理引用的字符串。 例如，像一個字符串

"John Smith" Ted Barry

它應該返回三個字符串John Smith，Ted和Barry。

Answer 1

搞亂之后，你可以使用正則表達式。 運行相當於“匹配所有”的：

((?<=("))[\w ]*(?=("(\s|$))))|((?<!")\w+(?!"))

一個Java示例：

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class Test
{ 
    public static void main(String[] args)
    {
        String someString = "\"Multiple quote test\" not in quotes \"inside quote\" \"A work in progress\"";
        Pattern p = Pattern.compile("((?<=(\"))[\\w ]*(?=(\"(\\s|$))))|((?<!\")\\w+(?!\"))");
        Matcher m = p.matcher(someString);

        while(m.find()) {
            System.out.println("'" + m.group() + "'");
        }
    }
}

輸出：

'Multiple quote test'
'not'
'in'
'quotes'
'inside quote'
'A work in progress'

可以在此處查看使用上述示例的正則表達式細分：

http://regex101.com/r/wM6yT9

盡管如此，正則表達式不應該是解決所有問題的方法 - 我只是玩得開心。 這個例子有許多邊緣情況，例如處理unicode字符，符號等。你最好使用一個久經考驗的庫來完成這類任務。 在使用此答案之前，請先查看其他答案。

Answer 2

試試這個丑陋的代碼。

    String str = "hello my dear \"John Smith\" where is Ted Barry";
    List<String> list = Arrays.asList(str.split("\\s"));
    List<String> resultList = new ArrayList<String>();
    StringBuilder builder = new StringBuilder();
    for(String s : list){
        if(s.startsWith("\"")) {
            builder.append(s.substring(1)).append(" ");
        } else {
            resultList.add((s.endsWith("\"") 
                    ? builder.append(s.substring(0, s.length() - 1)) 
                    : builder.append(s)).toString());
            builder.delete(0, builder.length());
        }
    }
    System.out.println(resultList);

Answer 3

好吧，我做了一個小狙擊手，做你想要的東西和更多的東西。 因為你沒有指定更多的條件我沒有經歷麻煩。 我知道這是一種骯臟的方式，你可以通過已經制作的東西獲得更好的結果。 但是為了編程的樂趣，這里是一個例子：

    String example = "hello\"John Smith\" Ted Barry lol\"Basi German\"hello";
    int wordQuoteStartIndex=0;
    int wordQuoteEndIndex=0;

    int wordSpaceStartIndex = 0;
    int wordSpaceEndIndex = 0;

    boolean foundQuote = false;
    for(int index=0;index<example.length();index++) {
        if(example.charAt(index)=='\"') {
            if(foundQuote==true) {
                wordQuoteEndIndex=index+1;
                //Print the quoted word
                System.out.println(example.substring(wordQuoteStartIndex, wordQuoteEndIndex));//here you can remove quotes by changing to (wordQuoteStartIndex+1, wordQuoteEndIndex-1)
                foundQuote=false;
                if(index+1<example.length()) {
                    wordSpaceStartIndex = index+1;
                }
            }else {
                wordSpaceEndIndex=index;
                if(wordSpaceStartIndex!=wordSpaceEndIndex) {
                    //print the word in spaces
                    System.out.println(example.substring(wordSpaceStartIndex, wordSpaceEndIndex));
                }
                wordQuoteStartIndex=index;
                foundQuote = true;
            }
        }

        if(foundQuote==false) {
            if(example.charAt(index)==' ') {
                wordSpaceEndIndex = index;
                if(wordSpaceStartIndex!=wordSpaceEndIndex) {
                    //print the word in spaces
                    System.out.println(example.substring(wordSpaceStartIndex, wordSpaceEndIndex));
                }
                wordSpaceStartIndex = index+1;
            }

            if(index==example.length()-1) {
                if(example.charAt(index)!='\"') {
                    //print the word in spaces
                    System.out.println(example.substring(wordSpaceStartIndex, example.length()));
                }
            }
        }
    }

這也檢查在引號之前或之前沒有用空格分隔的單詞，例如“John Smith”之前和“Basi German”之后的單詞“hello”。

當字符串被修改為"John Smith" Ted Barry ，輸出是三個字符串，1）“John Smith”2）Ted 3）Barry

示例中的字符串是你好“John Smith”Ted Barry lol“Basi German”你好並打印1）你好2）“John Smith”3）Ted 4）Barry 5）lol 6）“Basi German”7）你好

希望能幫助到你

Answer 4

commons-lang有一個StrTokenizer類來為你做這個，還有java-csv庫。

StrTokenizer的示例：

String params = "\"John Smith\" Ted Barry"
// Initialize tokenizer with input string, delimiter character, quote character
StrTokenizer tokenizer = new StrTokenizer(params, ' ', '"');
for (String token : tokenizer.getTokenArray()) {
   System.out.println(token);
}

輸出：

John Smith
Ted
Barry

Answer 5

這是我自己的版本，從http://pastebin.com/aZngu65y （發表在評論中）清理。 它可以照顧Unicode。 它將清理所有過多的空間（即使在引用中） - 根據需要，這可能是好的還是壞的。 不支持轉義報價。

private static String[] parse(String param) {
  String[] output;

  param = param.replaceAll("\"", " \" ").trim();
  String[] fragments = param.split("\\s+");

  int curr = 0;
  boolean matched = fragments[curr].matches("[^\"]*");
  if (matched) curr++;

  for (int i = 1; i < fragments.length; i++) {
    if (!matched)
      fragments[curr] = fragments[curr] + " " + fragments[i];

    if (!fragments[curr].matches("(\"[^\"]*\"|[^\"]*)"))
      matched = false;
    else {
      matched = true;

      if (fragments[curr].matches("\"[^\"]*\""))
        fragments[curr] = fragments[curr].substring(1, fragments[curr].length() - 1).trim();

      if (fragments[curr].length() != 0)
        curr++;

      if (i + 1 < fragments.length)
        fragments[curr] = fragments[i + 1];
    }
  }

  if (matched) { 
    return Arrays.copyOf(fragments, curr);
  }

  return null; // Parameter failure (double-quotes do not match up properly).
}

用於比較的樣本輸入：

"sdfskjf" sdfjkhsd "hfrif ehref" "fksdfj sdkfj fkdsjf" sdf sfssd


asjdhj    sdf ffhj "fdsf   fsdjh"
日本語　中文 "Tiếng Việt" "English"
    dsfsd    
   sdf     " s dfs    fsd f   "  sd f   fs df  fdssf  "日本語　中文"
""   ""     ""
"   sdfsfds "   "f fsdf

（第2行為空，第3行為空格，最后一行為格式錯誤）。 請根據您自己的預期輸出判斷，因為它可能會有所不同，但基線是這樣，第一種情況應該返回[sdfskjf，sdfjkhsd，hfrif ehref，fksdfj sdkfj fkdsjf，sdf，sfssd]。

使用分隔符拆分帶引號的字符串

問題描述

5 個解決方案

解決方案1
10 已采納

解決方案2
4 2012-05-22 03:35:13

解決方案3
3 2012-05-22 03:35:29

解決方案4
1 2012-05-22 03:35:18

解決方案5
1 2012-05-22 04:23:00

使用分隔符拆分帶引號的字符串

問題描述

5 個解決方案

解決方案1 10 已采納

解決方案2 4 2012-05-22 03:35:13

解決方案3 3 2012-05-22 03:35:29

解決方案4 1 2012-05-22 03:35:18

解決方案5 1 2012-05-22 04:23:00

解決方案1
10 已采納

解決方案2
4 2012-05-22 03:35:13

解決方案3
3 2012-05-22 03:35:29

解決方案4
1 2012-05-22 03:35:18

解決方案5
1 2012-05-22 04:23:00