简体   繁体   中英

Parsing CSV files using Regex in Java

I'm trying to create a program, which reads CSV files from a directory, using a regex it parses each line of the file and displays the lines after matching the regex pattern. For instance if this is the first line of my csv file

1997,Ford,E350,"ac, abs, moon",3000.00

my output should be

1997 Ford E350 ac, abs, moon 3000.00

I don't want to use any existing CSV libraries. I'm not good at regex, I've used a regex I found on net but its not working in my program This is my source code, I'll be grateful if any one tells me where and what I"ve to modify in order to make my code work. Pls explain me.

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.regex.Pattern;
import java.util.regex.Matcher;


public class RegexParser {

private static Charset charset = Charset.forName("UTF-8");
private static CharsetDecoder decoder = charset.newDecoder();
String pattern = "\"([^\"]*)\"|(?<=,|^)([^,]*)(?=,|$)";

void regexparser( CharBuffer cb)
{ 
    Pattern linePattern = Pattern.compile(".*\r?\n");
    Pattern csvpat = Pattern.compile(pattern);
    Matcher lm = linePattern.matcher(cb);
    Matcher pm = null;

    while(lm.find())
    {   
        CharSequence cs = lm.group();
        if (pm==null)
            pm = csvpat.matcher(cs);
            else
                pm.reset(cs);
        if(pm.find())
                     {

            System.out.println( cs);
                      }
        if (lm.end() == cb.limit())
        break;

        }

    }

public static void main(String[] args) throws IOException {
    RegexParser rp = new RegexParser();
    String folder = "Desktop/sample";
    File dir = new File(folder);
    File[] files = dir.listFiles();
    for( File entry: files)
    {
        FileInputStream fin = new FileInputStream(entry);
        FileChannel channel = fin.getChannel();
        int cs = (int) channel.size();
        MappedByteBuffer mbb = channel.map(FileChannel.MapMode.READ_ONLY, 0, cs);
        CharBuffer cb = decoder.decode(mbb);
        rp.regexparser(cb);
        fin.close();

    }




}

  }

This is my input file

Year,Make,Model,Description,Price

1997,Ford,E350,"ac, abs, moon",3000.00

1999,Chevy,"Venture ""Extended Edition""","",4900.00

1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00

1996,Jeep,Grand Cherokee,"MUST SELL!

air, moon roof, loaded",4799.00

I'm getting the same as output where is the problem in my code? why doesn't my regex have any impact on the code?

Using regexp seems "fancy", but with CSV files (at least in my opinion) is not worth it. For my parsing I use http://commons.apache.org/csv/ . It has never let me down. :)

Anyway I've found the fix myself, thanks guys for your suggestion and help.

This was my initial code

    if(pm.find()
        System.out.println( cs);

Now changed this to

  while(pm.find()
  {
 CharSequence css = pm.group();
 //print css
   }

Also I used a different Regex. I'm getting the desired output now.

You can try this : [ \\t]*+"[^"\\r\\n]*+"[ \\t]*+|[^,\\r\\n]*+ with this code :

try {
    Pattern regex = Pattern.compile("[ \t]*+\"[^\"\r\n]*+\"[ \t]*+|[^,\r\n]*+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
    Matcher matcher = regex.matcher(subjectString);
    while (matcher.find()) {
        // Do actions
    } 
} catch (PatternSyntaxException ex) {
    // Take care of errors
}

But yeah, if it's not a very critical demand do try to use something that already working : )

Take the advice offered and do not use regular expressions to parse a CSV file. The format is deceptively complicated in the way it can be used.

The following answer contains links to wikipedia and the RFC describing the CSV file format:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM