I've created a Grails application that pulls a 300MB text file from Google Storage via http (a once off import). The text file contains 35 000 000 codes that need to be stored into a MySQL database.
I created a Thread
that loops through the incoming InputStream
, creates a list of domain objects, loads them into an Array
and batch saves that array every 100 iterations.
The process would take hours to complete (which is okay). The issue is that when querying the table, I don't see a single record being saved. It's being buffered or cached somewhere, seemingly waiting for the process to complete - which is exactly what I don't want!
Code snippet
synchronized processImport (String url, String importType) throws RuntimeException {
InputStream stream = new URL(url).openStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
String code;
int i = 0;
try {
List<ComingInHotCode> buffer = new ArrayList<>();
while((code = reader.readLine()) != null) {
try {
buffer.add(new CodeDomainObject([code: code, used: false, type: importType]));
if (i % 100 == 0) {
CodeDomainObject.saveAll(buffer);
buffer.clear();
}
} catch (Exception ex) {
println ("Save error:" + ex.getMessage())
}
i++;
}
CodeDomainObject.saveAll(buffer);
} catch (Exception ex) {
throw ex;
} finally {
reader.close();
stream.close();
}
}
Note
sessionFactory.getCurrentSession().clear()
doesn't seem to do anything flush:true
doesn't seem to do anything Managed to solve the problem myself. For those interested
Use a stateless hibernate session and start and commit the transactionsns manually:
synchronized processImport (String url, String importType) throws RuntimeException {
InputStream stream = new URL(url).openStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
String code;
int i = 0;
StatelessSession session = sessionFactory.openStatelessSession();
def tx = session.beginTransaction();
try {
List<ComingInHotCode> buffer = new ArrayList<>();
while((code = reader.readLine()) != null) {
if (i % 1000 == 0 && i > 0) {
// After every 1000 records commit and reopen a new.
tx.commit();
Thread.sleep(1000); // I just added this to give the GC a chance
tx = session.beginTransaction();
}
session.insert(new MyDomainObject([code: code, used: false, type: importType]));
i++;
}
tx.commit();
Thread.sleep(1000);
} catch (Exception ex) {
throw ex;
} finally {
reader.close();
stream.close();
session.close();
}
}
You should be using save( flush:[true/false] )
instead of those strange hand-made bufferings and tx-commits:
new URL(url).withReader{ reader ->
MyDomainObject.withTransaction{
int counter = 0
reader.eachLine{ String line ->
counter++
new MyDomainObject( code:code ).save( flush:0 == counter % 1000 )
}
}
}
And yes, if your code is running in some other thread, it should be enclosed in MyDomainObject.withTransaction{}
An alternative to let the grails create and destroy sessions and transactions
synchronized processImport(String url, String importType) throws RuntimeException {
InputStream stream = new URL(url).openStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
String code;
int i = 0;
try {
List<CodeDomainObject> buffer = []
while ((code = reader.readLine()) != null) {
try {
buffer.add(new CodeDomainObject([code: code, used: false, type: importType]));
if (i % 1000 == 0 && i > 0) {
flushBuffer(buffer)
}
} catch (Exception ex) {
println("Save error:" + ex.getMessage())
}
i++;
}
flushBuffer(buffer)
} catch (Exception ex) {
throw ex;
} finally {
reader.close();
stream.close();
}
}
private void flushBuffer(List<CodeDomainObject> buffer) {
CodeDomainObject.withNewSession {
CodeDomainObject.withNewTransaction {
CodeDomainObject.saveAll(buffer);
buffer.clear();
}
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.