[英]Apache StrTokenizer How To Escape Quote and Comma in String Literal
考慮到執行雙引號可以使字符串文字中的引號轉義(如Apache文檔中所述),因此我具有解析某些csv的代碼。
private void test() {
char quote = '\'';
char delim = ',';
// should be split into [comma, comma], [quote ', comma]
String inputListValues = "'comma, comma', 'quote '', comma'";
StrTokenizer st = new StrTokenizer(inputListValues, delim, quote);
List<String> vals = new ArrayList<String>();
while (st.hasNext()) {
vals.add(st.nextToken().trim());
}
System.out.println(vals);
// should be split into [quote ', comma], [comma, comma]
String inputListValues2 = "'quote '', comma', 'comma, comma'";
StrTokenizer st2 = new StrTokenizer(inputListValues2, delim, quote);
List<String> vals2 = new ArrayList<String>();
while (st2.hasNext()) {
vals2.add(st2.nextToken().trim());
}
System.out.println(vals2);
}
輸出是
vals ArrayList<E> (id=1088)
[0] "comma, comma" (id=1063)
[1] "'quote ''" (id=1036)
[2] "comma'" (id=2123)
vals2 ArrayList<E> (id=2296)
[0] "quote ', comma" (id=1920)
[1] "'comma" (id=1852)
[2] "comma'" (id=1316)
我希望解析2個項目:[quote',逗號],[逗號,逗號]
如果它根本不起作用,那將是一回事,但是似乎改變順序會導致解析改變行為。
有人有什么主意嗎? 我即將使用另一個庫或正則表達式。
這是因為我開始考慮使用“ csv解析器”,但是事實並非如此。 醫生說
"a, ", b ,", c" - Three tokens "a, " , " b ", ", c" (quoted text untouched)
因此,空格是令牌的一部分。 我添加了setTrimmerMatcher,因為對於微調匹配器:
These characters are trimmed off on each side of the delimiter until the token or quote is found.
代碼最終被
StrTokenizer st = new StrTokenizer(toTokenize, DELIM_CHAR, QUOTE_CHAR);
// by default this is a STRING matching, not csv parser, so spaces count as part of the token
// ie "a, ", b ,", c" - Three tokens "a, " , " b ", ", c" (quoted text untouched)
// thus we set the trimmer matcher, which "are trimmed off on each side of the delimiter until the token or quote is found."
st.setTrimmerMatcher(StrMatcher.trimMatcher());
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.