如何在Java中有效地計算CSV文件的行數

Question

我已經開發了一個代碼，該代碼可以打開CSV文件並使用for循環對行數進行計數，但是我覺得這種方法效率不高，並且會導致一些延遲。

TargetFile.mdb有120行
report.csv有11000行

如果我使用此方法，則代碼需要運行120*11000=1320000 times才能對每個資源計數進行計數。 這是我的代碼：

這是新的有效代碼，它們由Xavier Delamotte有效地對行進行計數：

import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import au.com.bytecode.opencsv.CSVReader;

import com.healthmarketscience.jackcess.Database;
import com.healthmarketscience.jackcess.Table;

public class newcount {

    public static class ValueKey{
        String mdmId;
        String pgName;

        @Override
        public int hashCode() {
            final int prime = 31;
            int result = 1;
            result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode());
            result = prime * result
                + ((pgName == null) ? 0 : pgName.hashCode());
            return result;
        }
        @Override
        public boolean equals(Object obj) {
            if (this == obj)
                return true;
            if (obj == null)
                return false;
            if (getClass() != obj.getClass())
                return false;
            ValueKey other = (ValueKey) obj;
            if (mdmId == null) {
                if (other.mdmId != null)
                    return false;
            } else if (!mdmId.equals(other.mdmId))
                return false;
            if (pgName == null) {
                if (other.pgName != null)
                    return false;
            } else if (!pgName.equals(other.pgName))
                return false;
            return true;
        }
        public ValueKey(String mdmId, String pgName) {
            super();
            this.mdmId = mdmId;
            this.pgName = pgName;
        }
    }

    public static void main(String[] args) throws IOException, SQLException,Throwable{


        Integer count;

        String MDMID,NAME,PGNAME,PGTARGET,TEAM;

        Table RESOURCES = Database.open(new File("C:/STATS/TargetFile.mdb")).getTable("RESOURCES");
        int pcount = RESOURCES.getRowCount();


        String csvFilename = "C:\\MDMSTATS\\APEX\\report.csv";
        CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
        List<String[]> content = csvReader.readAll();
        Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>();
        for (String[] rowcsv  : content) {
            ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]);
            count = csvValuesCount.get(key);
            csvValuesCount.put(key,count == null ? 1: count + 1);

        }

        //int count = 0;
        // Taking 1st resource data
        for (int i = 0; i < pcount-25; i++) {
            Map<String, Object> row = RESOURCES.getNextRow();
            TEAM = row.get("TEAM").toString();
            MDMID = row.get("MDM ID").toString();
            NAME = row.get("RESOURCE NAME").toString();
            PGNAME = row.get("PG NAME").toString();
            PGTARGET = row.get("PG TARGET").toString();
            int PGTARGETI = Integer.parseInt(PGTARGET);
            Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME));
            count = countInteger == null ? 0: countInteger;
            System.out.println(NAME+"\t"+PGNAME+"\t"+count);

        }
    }
}

Answer 1

我建議只讀取一次csv文件，並計算由mdmId和pgName組成的鍵的出現次數。

如果您使用番石榴，則可以使用MultiSet<ValueKey> http://guava-libraries.googlecode.com/svn-history/r8/trunk/javadoc/com/google/common/collect/Multiset.html代替Map<ValueKey,Integer>

編輯：要使用ValueKey類，您需要放入另一個文件或將其聲明為靜態。

類ValueKey：

    public static class ValueKey{
        String mdmId;
        String pgName;
        @Override
        public int hashCode() {
            final int prime = 31;
            int result = 1;
            result = prime * result + ((mdmId == null) ? 0 : mdmId.hashCode());
            result = prime * result
                    + ((pgName == null) ? 0 : pgName.hashCode());
            return result;
        }
        @Override
        public boolean equals(Object obj) {
            if (this == obj)
                return true;
            if (obj == null)
                return false;
            if (getClass() != obj.getClass())
                return false;
            ValueKey other = (ValueKey) obj;
            if (mdmId == null) {
                if (other.mdmId != null)
                    return false;
            } else if (!mdmId.equals(other.mdmId))
                return false;
            if (pgName == null) {
                if (other.pgName != null)
                    return false;
            } else if (!pgName.equals(other.pgName))
                return false;
            return true;
        }
        public ValueKey(String mdmId, String pgName) {
            super();
            this.mdmId = mdmId;
            this.pgName = pgName;
        }
    }

您的方法：

    Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES");
    int pcount = RESOURCES.getRowCount();

    String csvFilename = "C:\\STATS\\APEX\\report.csv";
    CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
    List<String[]> content = csvReader.readAll();
    Map<ValueKey, Integer> csvValuesCount = new HashMap<ValueKey, Integer>();
    for (String[] rowcsv  : content) {
        ValueKey key = new ValueKey(rowcsv[6], rowcsv[1]);
        Integer count = csvValuesCount.get(key);
        csvValuesCount.put(key,count == null ? 1: count + 1);

    }

    int count = 0;
    // Taking 1st resource data
    for (int i = 0; i < pcount; i++) {
        Map<String, Object> row = RESOURCES.getNextRow();
        TEAM = row.get("TEAM").toString();
        MDMID = row.get("MDM ID").toString();
        NAME = row.get("RESOURCE NAME").toString();
        PGNAME = row.get("PG NAME").toString();
        PGTARGET = row.get("PG TARGET").toString();
        int PGTARGETI = Integer.parseInt(PGTARGET);
        Integer countInteger = csvValuesCount.get(new ValueKey(MDMID, PGNAME));
        count = countInteger == null ? 0: countInteger;
    }

Answer 2

親愛的朋友，我建議您使用OpenCSV

我認為它可以滿足您的要求;）

Answer 3

首先閱讀CSV，並設置一組字段6的值，然后使用該值來更新計數。 這應該很快。

//open csv and make lookup set
Set<String> mdmids = new HashSet<String>() 
String[] rowcsv = null;
String csvFilename = "C:\\STATS\\APEX\\report.csv";
CSVReader csvReader = new CSVReader(new FileReader(csvFilename));
List content = csvReader.readAll();

for (Object object : content) {
    rowcsv = (String[]) object;             
       mdmids.add(rowcsv[6])
}
Table RESOURCES = Database.open(new File("TargetFile.mdb")).getTable("RESOURCES");
pcount = RESOURCES.getRowCount();
count = 0;
// Taking 1st resource data
for (i = 0; i < pcount; i++){
Map<String, Object> row = RESOURCES.getNextRow();                            
    TEAM = row.get("TEAM").toString();
MDMID = row.get("MDM ID").toString();
NAME = row.get("RESOURCE NAME").toString();
PGNAME = row.get("PG NAME").toString();
PGTARGET = row.get("PG TARGET").toString();
int PGTARGETI = Integer.parseInt(PGTARGET);

// use lookup set
if(mdmids.contains(MDMID)) {
    count++;
}
}

如何在Java中有效地計算CSV文件的行數

問題描述

3 個解決方案

解決方案1
3 已采納 2013-04-06 11:40:07

解決方案2
0 2013-04-06 11:32:35

解決方案3
0 2013-04-06 11:37:18

如何在Java中有效地計算CSV文件的行數

問題描述

3 個解決方案

解決方案1 3 已采納 2013-04-06 11:40:07

解決方案2 0 2013-04-06 11:32:35

解決方案3 0 2013-04-06 11:37:18

解決方案1
3 已采納 2013-04-06 11:40:07

解決方案2
0 2013-04-06 11:32:35

解決方案3
0 2013-04-06 11:37:18