简体   繁体   中英

H2 CSV import from file stream

I am currently using bulk CSVREAD to import large CSV files into H2:

logger.info("create param table");
templ.execute(paramsTable + " AS SELECT * FROM CSVREAD('" + consoleArgs.paramDataFile + "','FID,MEANING,VALUE,HID');");

However to optimize storage we will compress the CSV files. Therefore I need to decompress the files before. Since we do not want to store the decompressed file it would be nice to directly feed the uncompressed char stream into H2. Any ideas how?

I would probably just insert the data manually using a PreparedStatement, without using a CSV file.

Alternatively, you could use the CSV tool yourself (it allows to read from a Reader).

You could create a user defined function that returns a table (see "Using a Function as a Table" a bit below in the docs), and for example use the CSV tool there (or your own way to generate the data, without CSV). Example of a function returning a result set:

CREATE ALIAS MY_CSV AS $$
import org.h2.tools.*;
import java.sql.*;
@CODE
ResultSet getCsv(Connection conn, String fileName)
        throws SQLException {
    SimpleResultSet rs = new SimpleResultSet();
    rs.addColumn("A", Types.INTEGER, 10, 0);
    String url = conn.getMetaData().getURL();
    if (url.equals("jdbc:columnlist:connection")) {
        return rs;
    }
    rs.addRow(1);
    return rs;
}
$$;

Then use that function to create the table:

CREATE TABLE TEST 
AS SELECT A FROM MY_CSV('fileName');    

H2 will call the method 3 times:

  • In the prepare phase, while parsing the query, to verify it is syntactically correct. Here it retrieves the column names, and checks one of those is "A". This is fast.
  • In the execute phase, to retrieve the column names and data types. Theoretically this call is not required, as nothing changed since the first call. However, please note this is again to just retrieve the column names (the URL is again jdbc:columnlist:connection ). This is fast.
  • In the execute phase, to retrieve the actual data. In this case the URL is jdbc:default:connection . This is the slow part.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM