My file is 14GB and I would like to read line by line and will be export to excel file.
As the file include different language, such as Chinese and English,
I tried to use FileInputStream
with UTF-16
for reading data,
but result in java.lang.OutOfMemoryError
: Java heap space
I have tried to increase the heap space but problem still exist
How should I change my file reading code?
createExcel(); //open a excel file
try {
//success but cannot read and output for different language
//br = new BufferedReader(
// new FileReader("C:\\Users\\brian_000\\Desktop\\appdatafile.json"));
//result in java.lang.OutOfMemoryError: Java heap space
br = new BufferedReader(new InputStreamReader(
new FileInputStream("C:\\Users\\brian_000\\Desktop\\appdatafile.json"),
"UTF-16"));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("cann be print");
String line;
int i=0;
try {
while ((line = br.readLine()) != null) {
// process the line.
try{
System.out.println("cannot be print");
//some statement for storing the data in variables.
//a function for writing the variable into excel
writeToExcel(platform,kind,title,shareUrl,contentRating,userRatingCount,averageUserRating
,marketLanguage,pricing
,majorVersionNumber,releaseDate,downloadsCount);
}
catch(com.google.gson.JsonSyntaxException exception){
System.out.println("error");
}
// trying to get the first 1000rows
i++;
if(i==1000){
br.close();
break;
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
closeExcel();
public static void writeToExcel(String platform,String kind,String title,String shareUrl,String contentRating,String userRatingCount,String averageUserRating
,String marketLanguage,String pricing,String majorVersionNumber,String releaseDate,String downloadsCount){
currentRow++;
System.out.println(currentRow);
if(currentRow>1000000){
currentsheet++;
sheet = workbook.createSheet("apps"+currentsheet, 0);
createFristRow();
currentRow=1;
}
try {
//character id
Label label = new Label(0, currentRow, String.valueOf(currentRow), cellFormat);
sheet.addCell(label);
//12 of statements for write the data to excel
label = new Label(1, currentRow, platform, cellFormat);
sheet.addCell(label);
} catch (WriteException e) {
e.printStackTrace();
}
Excel, UTF-16
As mentioned, the problem is likely caused by the Excel document construction. Try whether UTF-8 yields a lesser size; for instance Chinese HTML still is better compressed with UTF-8 rather than UTF-16 because of the many ASCII chars.
Object creation java
You can share common small Strings . Useful for String.valueOf(row)
and such. Cache only strings with a small length. I assume the cellFormat to be fixed.
DIY with xlsx
Excel builds a costly DOM. If CSV text (with a Unicode BOM marker) is no options (you could give it the extension .xls to be opened by Excel), try generating an xslx. Create an example workbook in xslx. This is a zip format you can process in java easiest with a zip filesystem . For Excel there is a content XML and a shared XML, sharing cell values with an index from content to shared strings. Then no overflow happens as you write buffer-wise. Or use a JDBC driver for Excel. (No recent experience on my side, maybe JDBC/ODBC.)
Best
Excel is hard to use with that much data. Consider more effort using a database, or write every N rows in a proper Excel file. Maybe you can later import them with java in one document. (I doubt it.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.