[英]How count Rows of CSV file Efficiently in Java
我已經開發了一個代碼,該代碼可以打開CSV文件並使用for循環對行數進行計數,但是我覺得這種方法效率不高,並且會導致一些延遲。
TargetFile.mdb
有120行 report.csv
有11000行 如果我使用此方法,則代碼需要運行120*11000=1320000 times
才能對每個資源計數進行計數。 這是我的代碼:
這是新的有效代碼,它們由Xavier Delamotte有效地對行進行計數:
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import au.com.bytecode.opencsv.CSVReader;
import com.healthmarketscience.jackcess.Database;
import com.healthmarketscience.jackcess.Table;
public class newcount {
public static class ValueKey{
String mdmId;
String pgName;
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode());
result = prime * result
+ ((pgName == null) ? 0 : pgName.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
ValueKey other = (ValueKey) obj;
if (mdmId == null) {
if (other.mdmId != null)
return false;
} else if (!mdmId.equals(other.mdmId))
return false;
if (pgName == null) {
if (other.pgName != null)
return false;
} else if (!pgName.equals(other.pgName))
return false;
return true;
}
public ValueKey(String mdmId, String pgName) {
super();
this.mdmId = mdmId;
this.pgName = pgName;
}
}
public static void main(String[] args) throws IOException, SQLException,Throwable{
Integer count;
String MDMID,NAME,PGNAME,PGTARGET,TEAM;
Table RESOURCES = Database.open(new File("C:/STATS/TargetFile.mdb")).getTable("RESOURCES");
int pcount = RESOURCES.getRowCount();
String csvFilename = "C:\\MDMSTATS\\APEX\\report.csv";
CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
List<String[]> content = csvReader.readAll();
Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>();
for (String[] rowcsv : content) {
ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]);
count = csvValuesCount.get(key);
csvValuesCount.put(key,count == null ? 1: count + 1);
}
//int count = 0;
// Taking 1st resource data
for (int i = 0; i < pcount-25; i++) {
Map<String, Object> row = RESOURCES.getNextRow();
TEAM = row.get("TEAM").toString();
MDMID = row.get("MDM ID").toString();
NAME = row.get("RESOURCE NAME").toString();
PGNAME = row.get("PG NAME").toString();
PGTARGET = row.get("PG TARGET").toString();
int PGTARGETI = Integer.parseInt(PGTARGET);
Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME));
count = countInteger == null ? 0: countInteger;
System.out.println(NAME+"\t"+PGNAME+"\t"+count);
}
}
}
我建議只讀取一次csv文件,並計算由mdmId和pgName組成的鍵的出現次數。
如果您使用番石榴,則可以使用MultiSet<ValueKey>
http://guava-libraries.googlecode.com/svn-history/r8/trunk/javadoc/com/google/common/collect/Multiset.html代替Map<ValueKey,Integer>
編輯:要使用ValueKey類,您需要放入另一個文件或將其聲明為靜態。
類ValueKey:
public static class ValueKey{
String mdmId;
String pgName;
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode());
result = prime * result
+ ((pgName == null) ? 0 : pgName.hashCode());
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
ValueKey other = (ValueKey) obj;
if (mdmId == null) {
if (other.mdmId != null)
return false;
} else if (!mdmId.equals(other.mdmId))
return false;
if (pgName == null) {
if (other.pgName != null)
return false;
} else if (!pgName.equals(other.pgName))
return false;
return true;
}
public ValueKey(String mdmId, String pgName) {
super();
this.mdmId = mdmId;
this.pgName = pgName;
}
}
您的方法:
Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES");
int pcount = RESOURCES.getRowCount();
String csvFilename = "C:\\STATS\\APEX\\report.csv";
CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
List<String[]> content = csvReader.readAll();
Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>();
for (String[] rowcsv : content) {
ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]);
Integer count = csvValuesCount.get(key);
csvValuesCount.put(key,count == null ? 1: count + 1);
}
int count = 0;
// Taking 1st resource data
for (int i = 0; i < pcount; i++) {
Map<String, Object> row = RESOURCES.getNextRow();
TEAM = row.get("TEAM").toString();
MDMID = row.get("MDM ID").toString();
NAME = row.get("RESOURCE NAME").toString();
PGNAME = row.get("PG NAME").toString();
PGTARGET = row.get("PG TARGET").toString();
int PGTARGETI = Integer.parseInt(PGTARGET);
Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME));
count = countInteger == null ? 0: countInteger;
}
親愛的朋友,我建議您使用OpenCSV
我認為它可以滿足您的要求;)
首先閱讀CSV,並設置一組字段6的值,然后使用該值來更新計數。 這應該很快。
//open csv and make lookup set
Set<String> mdmids = new HashSet<String>()
String[] rowcsv = null;
String csvFilename = "C:\\STATS\\APEX\\report.csv";
CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
List content = csvReader.readAll();
for (Object object : content) {
rowcsv = (String[]) object;
mdmids.add(rowcsv[6])
}
Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES");
pcount = RESOURCES.getRowCount();
count = 0;
// Taking 1st resource data
for (i = 0; i < pcount; i++){
Map<String, Object> row = RESOURCES.getNextRow();
TEAM = row.get("TEAM").toString();
MDMID = row.get("MDM ID").toString();
NAME = row.get("RESOURCE NAME").toString();
PGNAME = row.get("PG NAME").toString();
PGTARGET = row.get("PG TARGET").toString();
int PGTARGETI = Integer.parseInt(PGTARGET);
// use lookup set
if(mdmids.contains(MDMID)) {
count++;
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.