简体   繁体   English

Java CSVReader以双引号忽略逗号

[英]Java CSVReader ignore commas in double quotes

I have a CSV file that I am having trouble parsing. 我有一个CSV文件,我在解析时遇到问题。 I am using the opencsv library. 我正在使用opencsv库。 Here is what my data looks like and what I am trying to achieve. 这是我的数据看起来和我想要实现的目标。

RPT_PE,CLASS,RPT_MKT,PROV_CTRCT,CENTER_NM,GK_TY,MBR_NM,MBR_PID "20150801","NULL","33612","00083249P PCP602","JOE SMITH ARNP","NULL","FRANK, LUCAS E","50004655200" RPT_PE,CLASS,RPT_MKT,PROV_CTRCT,CENTER_NM,GK_TY,MBR_NM,MBR_PID“20150801”,“NULL”,“33612”,“00083249P PCP602”,“JOE SMITH ARNP”,“NULL”,“FRANK,LUCAS E”,“ 50004655200"

The issue I am having is the member name ( "FRANK, LUCAS E" ) is being split into two columns and the member name should be one. 我遇到的问题是成员名称( "FRANK, LUCAS E" )被分成两列,成员名称应该是一个。 Again I'm using opencsv and a comma as the separator. 我再次使用opencsv和逗号作为分隔符。 Is there any way I can ignore the commas inside the double-quotes? 有什么方法可以忽略双引号内的逗号吗?

        public void loadCSV(String csvFile, String tableName,
            boolean truncateBeforeLoad) throws Exception {

        CSVReader csvReader = null;
        if (null == this.connection) {
            throw new Exception("Not a valid connection.");
        }
        try {

            csvReader = new CSVReader(new FileReader(csvFile), this.seprator);

        } catch (Exception e) {
            e.printStackTrace();
            throw new Exception("Error occured while executing file. "
                    + e.getMessage());
        }
        String[] headerRow = csvReader.readNext();

        if (null == headerRow) {
            throw new FileNotFoundException(
                    "No columns defined in given CSV file."
                    + "Please check the CSV file format.");
        }

        String questionmarks = StringUtils.repeat("?,", headerRow.length);
        questionmarks = (String) questionmarks.subSequence(0, questionmarks
                .length() - 1);

        String query = SQL_INSERT.replaceFirst(TABLE_REGEX, tableName);
        System.out.println("Base Query: " + query);
        String headerRowMod = Arrays.toString(headerRow).replaceAll(", ]", "]");
        String[] strArray = headerRowMod.split(",");

        query = query
                .replaceFirst(KEYS_REGEX, StringUtils.join(strArray, ","));

        System.out.println("Add Headers: " + query);
        query = query.replaceFirst(VALUES_REGEX, questionmarks);
        System.out.println("Add questionmarks: " + query);

        String[] nextLine;
        Connection con = null;
        PreparedStatement ps = null;
        try {
            con = this.connection;
            con.setAutoCommit(false);
            ps = con.prepareStatement(query);

            if (truncateBeforeLoad) {
                //delete data from table before loading csv
                con.createStatement().execute("DELETE FROM " + tableName);
            }

            final int batchSize = 1000;
            int count = 0;
            Date date = null;
            while ((nextLine = csvReader.readNext()) != null) {
                System.out.println("Next Line: " + Arrays.toString(nextLine));
                if (null != nextLine) {
                    int index = 1;
                    for (String string : nextLine) {
                        date = DateUtil.convertToDate(string);
                        if (null != date) {
                            ps.setDate(index++, new java.sql.Date(date
                                    .getTime()));
                        } else {
                            ps.setString(index++, string);
                        }
                    }
                    ps.addBatch();
                }
                if (++count % batchSize == 0) {
                    ps.executeBatch();
                }
            }
            ps.executeBatch(); // insert remaining records
            con.commit();
        } catch (SQLException | IOException e) {
            con.rollback();
            e.printStackTrace();
            throw new Exception(
                    "Error occured while loading data from file to database."
                    + e.getMessage());
        } finally {
            if (null != ps) {
                ps.close();
            }
            if (null != con) {
                con.close();
            }
            csvReader.close();
        }
    }

    public char getSeprator() {
        return seprator;
    }

    public void setSeprator(char seprator) {
        this.seprator = seprator;
    }

    public char getQuoteChar() {
        return quoteChar;
    }

    public void setQuoteChar(char quoteChar) {
        this.quoteChar = quoteChar;
    }
}

Did you try the the following? 你试过以下吗?

CSVReader reader = new CSVReader(new FileReader("yourfile.csv"), ',');

I wrote a following program and it works for me, I got the following result: 我写了一个以下的程序,它适用于我,我得到以下结果:

[20150801] [NULL] [33612] [00083249P PCP602] [JOE SMITH ARNP] [NULL] [FRANK, LUCAS E] [50004655200] [20150801] [NULL] [33612] [00083249P PCP602] [JOE SMITH ARNP] [NULL] [FRANK,LUCAS E] [50004655200]

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

import au.com.bytecode.opencsv.CSVReader;

public class CVSTest {

    /**
     * @param args
     */
    public static void main(String[] args) {
        CSVReader reader = null;
        try {

            reader = new CSVReader(new FileReader(
                    "C:/Work/Dev/Projects/Pure_Test/Test/src/cvs"), ',');
        } catch (FileNotFoundException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }
        String[] nextLine;
        try {
            while ((nextLine = reader.readNext()) != null) {
                // nextLine[] is an array of values from the line
                System.out.println("[" + nextLine[0] + "] [" + nextLine[1]
                        + "] [" + nextLine[2] + "] [" + nextLine[3] + "] ["
                        + nextLine[4] + "] [" + nextLine[5] + "] ["
                        + nextLine[6] + "] [" + nextLine[7] + "]");
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

}

According to the documentation, you can supply custom separator and quote characters in the constructor, which should deal with it: 根据文档,您可以在构造函数中提供自定义分隔符和引号字符,它们应该处理它:

CSVReader(Reader reader, char separator, char quotechar)

Construct your reader with , as separator and " as quotechar. 构建您的读者,作为分隔符和“作为quotechar。

Your case should be handled out of the box with no special configuration required. 您的箱子应该开箱即用,无需特殊配置。

If you can't make it work, then just switch to uniVocity-parsers to do this for you - it's twice as fast in comparison to OpenCSV, requires much less code and is packed with features. 如果你不能使它工作,那么只需切换到uniVocity解析器为你做这个 - 它比OpenCSV快两倍,需要的代码少得多,并且功能丰富。

CsvParserSettings settings = new CsvParserSettings();     // you have many configuration options here - check the tutorial.

CsvParser parser = new CsvParser(settings);

List<String[]> allRows = parser.parseAll(new FileReader(new File("C:/Work/Dev/Projects/Pure_Test/Test/src/cvs")));

Disclosure: I am the author of this library. 披露:我是这个图书馆的作者。 It's open-source and free (Apache V2.0 license). 它是开源和免费的(Apache V2.0许可证)。

It is simple to load your CSV as an SQL table into HSQLDB, then select rows from the table to insert into another database. 将CSV作为SQL表加载到HSQLDB中很简单,然后从表中选择行以插入到另一个数据库中。 HSQLDB handles commas inside quotes. HSQLDB处理引号内的逗号。 You need to define your text source as "quoted". 您需要将文本来源定义为“引用”。 See this: 看到这个:

http://hsqldb.org/doc/2.0/guide/texttables-chapt.html http://hsqldb.org/doc/2.0/guide/texttables-chapt.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM