简体   繁体   English

在Java中使用Regex解析CSV文件

[英]Parsing CSV files using Regex in Java

I'm trying to create a program, which reads CSV files from a directory, using a regex it parses each line of the file and displays the lines after matching the regex pattern. 我正在尝试创建一个程序,该程序使用正则表达式从目录中读取CSV文件,它解析文件的每一行并在匹配正则表达式模式后显示这些行。 For instance if this is the first line of my csv file 例如,如果这是我的csv文件的第一行

1997,Ford,E350,"ac, abs, moon",3000.00

my output should be 我的输出应该是

1997 Ford E350 ac, abs, moon 3000.00

I don't want to use any existing CSV libraries. 我不想使用任何现有的CSV库。 I'm not good at regex, I've used a regex I found on net but its not working in my program This is my source code, I'll be grateful if any one tells me where and what I"ve to modify in order to make my code work. Pls explain me. 我不擅长正则表达式,我使用了在网上找到的正则表达式,但不适用于我的程序。这是我的源代码,如果有人告诉我要在哪里修改内容,我将不胜感激。为了使我的代码正常工作,请向我解释。

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.regex.Pattern;
import java.util.regex.Matcher;


public class RegexParser {

private static Charset charset = Charset.forName("UTF-8");
private static CharsetDecoder decoder = charset.newDecoder();
String pattern = "\"([^\"]*)\"|(?<=,|^)([^,]*)(?=,|$)";

void regexparser( CharBuffer cb)
{ 
    Pattern linePattern = Pattern.compile(".*\r?\n");
    Pattern csvpat = Pattern.compile(pattern);
    Matcher lm = linePattern.matcher(cb);
    Matcher pm = null;

    while(lm.find())
    {   
        CharSequence cs = lm.group();
        if (pm==null)
            pm = csvpat.matcher(cs);
            else
                pm.reset(cs);
        if(pm.find())
                     {

            System.out.println( cs);
                      }
        if (lm.end() == cb.limit())
        break;

        }

    }

public static void main(String[] args) throws IOException {
    RegexParser rp = new RegexParser();
    String folder = "Desktop/sample";
    File dir = new File(folder);
    File[] files = dir.listFiles();
    for( File entry: files)
    {
        FileInputStream fin = new FileInputStream(entry);
        FileChannel channel = fin.getChannel();
        int cs = (int) channel.size();
        MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_ONLY, 0, cs);
        CharBuffer cb = decoder.decode(mbb);
        rp.regexparser(cb);
        fin.close();

    }




}

  }

This is my input file 这是我的输入文件

Year,Make,Model,Description,Price 年,制造,型号,描述,价格

1997,Ford,E350,"ac, abs, moon",3000.00 1997,Ford,E350,“ ac,abs,moon”,3000.00

1999,Chevy,"Venture ""Extended Edition""","",4900.00 1999年,雪佛兰,“冒险”,“扩展版”,“”,4900.00

1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00 1999,Chevy,“ Venture”“扩展版,非常大”“”,“”,5000.00

1996,Jeep,Grand Cherokee,"MUST SELL! 1996年,吉普车,大切诺基,“必须卖!

air, moon roof, loaded",4799.00 空气,天窗,满载”,4799.00

I'm getting the same as output where is the problem in my code? 我得到的输出与我的代码中的问题相同? why doesn't my regex have any impact on the code? 为什么我的正则表达式对代码没有任何影响?

Using regexp seems "fancy", but with CSV files (at least in my opinion) is not worth it. 使用regexp似乎“花哨”,但是使用CSV文件(至少在我看来)是不值得的。 For my parsing I use http://commons.apache.org/csv/ . 对于我的解析,我使用http://commons.apache.org/csv/ It has never let me down. 它从来没有让我失望。 :) :)

Anyway I've found the fix myself, thanks guys for your suggestion and help. 无论如何,我自己都找到了解决方案,谢谢大家的建议和帮助。

This was my initial code 这是我的初始代码

    if(pm.find()
        System.out.println( cs);

Now changed this to 现在将其更改为

  while(pm.find()
  {
 CharSequence css = pm.group();
 //print css
   }

Also I used a different Regex. 我也使用了不同的正则表达式。 I'm getting the desired output now. 我现在得到所需的输出。

You can try this : [ \\t]*+"[^"\\r\\n]*+"[ \\t]*+|[^,\\r\\n]*+ with this code : 您可以尝试使用以下代码: [ \\t]*+"[^"\\r\\n]*+"[ \\t]*+|[^,\\r\\n]*+

try {
    Pattern regex = Pattern.compile("[ \t]*+\"[^\"\r\n]*+\"[ \t]*+|[^,\r\n]*+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
    Matcher matcher = regex.matcher(subjectString);
    while (matcher.find()) {
        // Do actions
    } 
} catch (PatternSyntaxException ex) {
    // Take care of errors
}

But yeah, if it's not a very critical demand do try to use something that already working : ) 但是,是的,如果不是非常关键的需求,请尝试使用已经有效的方法:)

Take the advice offered and do not use regular expressions to parse a CSV file. 请遵循提供的建议,不要使用正则表达式来解析CSV文件。 The format is deceptively complicated in the way it can be used. 该格式在使用方式上看似复杂。

The following answer contains links to wikipedia and the RFC describing the CSV file format: 以下答案包含指向Wikipedia和描述CSV文件格式的RFC的链接:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM