[英]Parsing using Pattern in Java
I want to Parse the lines of a file Using parsingMethod 我想解析文件的行使用parsingMethod
test.csv test.csv
Frank George,Henry,Mary / New York,123456
,Beta Charli,"Delta,Delta Echo
", 25/11/1964, 15/12/1964,"40,000,000.00",0.0975,2,"King, Lincoln ",Alpha
This is the way i read line 这是我读行的方式
public static void main(String[] args) throws Exception {
File file = new File("C:\\Users\\test.csv");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line2;
while ((line2= reader.readLine()) !=null) {
String[] tab = parsingMethod(line2, ",");
for (String i : tab) {
System.out.println( i );
}
}
}
public static String[] parsingMethod(String line,String parser) {
List<String> liste = new LinkedList<String>();
String patternString ="(([^\"][^"+parser+ "]*)|\"([^\"]*)\")" +parser+"?";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher =pattern.matcher(line);
while (matcher.find()) {
if(matcher.group(2) != null){
liste.add(matcher.group(2).replace("\n","").trim());
}else if(matcher.group(3) != null){
liste.add(matcher.group(3).replace("\n","").trim());
}
}
String[] result = new String[liste.size()];
return liste.toArray(result);
}
}
Output : 输出:
Frank George
Henry
Mary / New York
123456
Beta Charli
Delta
Delta Echo
"
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King
Lincoln
"
Alpha
Delta
Delta Echo
I want to remove this " , Can any one help me to improve my Pattern. 我想删除这个“,任何人都可以帮助我改进我的模式。
Expected output 预期产出
Frank George
Henry
Mary / New York
123456
Beta Charli
Delta
Delta Echo
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King
Lincoln
Alpha
Delta
Delta Echo
Output for line 3 第3行的输出
25/11/1964
15/12/1964
40
000
000.00
0.0975
2
King
Lincoln
Your code didn't compile properly but that was caused by some of the "
not being escaped. 你的代码没有正确编译,但这是由一些
"
没有被转义。
But this should do the trick: 但这应该可以解决问题:
String patternString = "(?:^.,|)([^\"]*?|\".*?\")(?:,|$)";
Pattern pattern = Pattern.compile(patternString, Pattern.MULTILINE);
(?:^.,|)
is a non capturing group that matches a single character at the start of the line (?:^.,|)
是一个非捕获组,它匹配行开头的单个字符
([^\\"]*?|\\".*?\\")
is a capturing group that either matches everything but " OR anything in between " " ([^\\"]*?|\\".*?\\")
是一个捕获组,它可以匹配除”或之间的任何东西“之外的所有内容
(?:,|$)
is a non capturing group that matches a end of the line or a comma. (?:,|$)
是一个非捕获组,匹配行尾或逗号。
Note: ^
and $
only work as stated when the pattern is compiled with the Pattern.MULTILINE flag 注意:
^
和$
仅在使用Pattern.MULTILINE标志编译模式时按照规定工作
I can't reproduce your result but I'm thinking maybe you want to leave the quotes out of the second captured group, like this: 我无法重现您的结果,但我想也许您想要将引号留在第二个捕获的组中,如下所示:
"(([^\"][^"+parser+ "]*)|\"([^\"]*))\"" +parser+"?"
Edit: Sorry, this won't work. 编辑:对不起,这不起作用。 Maybe you want to let any number of
^\\"
in the first group as well, like this: (([^,\\"]*)|\\"([^\\"]*)\\"),?
也许你想在第一组中留下任何数量的
^\\"
,如下所示: (([^,\\"]*)|\\"([^\\"]*)\\"),?
As i can see the lines are related so try this: 我可以看到线条相关所以试试这个:
public static void main(String[] args) throws Exception {
File file = new File("C:\\Users\\test.csv");
BufferedReader reader = new BufferedReader(new FileReader(file));
StringBuilder line = new StringBuilder();
String lineRead;
while ((lineRead = reader.readLine()) != null) {
line.append(lineRead);
}
String[] tab = parsingMethod(line.toString());
for (String i : tab) {
System.out.println(i);
}
}
public static String[] parsingMethod(String line) {
List<String> liste = new LinkedList<String>();
String patternString = "(([^\"][^,]*)|\"([^\"]*)\"),?";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
if (matcher.group(2) != null) {
liste.add(matcher.group(2).replace("\n", "").trim());
} else if (matcher.group(3) != null) {
liste.add(matcher.group(3).replace("\n", "").trim());
}
}
String[] result = new String[liste.size()];
return liste.toArray(result);
}
Ouput: 输出继电器:
Frank George
Henry
Mary / New York
123456
Beta Charli
Delta,Delta Echo
25/11/1964
15/12/1964
40,000,000.00
0.0975
2
King, Lincoln
Alpha
as Delta, Delta Echo is in a quotation this should appear in the same line ! 作为Delta,Delta Echo在引用中应该出现在同一行! like as King, Lincoln
像林肯一样
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.