I am ranking certain groups of elements within a.csv file. My program works. However...
I am seeking advice on on how to improve the efficiency of a program I have written. I do not seek a review of my code. Stackoverflow ref . Nor I am requesting someone to write code for me. All I am asking is: "Is there a more efficient way? and if so what?"
I have a program that takes multiple.csv files, modifies them and adds extra data. These files are then saved. Below is a respresentation of the input data:
ISBN, Shop, Cost, ReviewScore,
9780008305796, A Bookshop, 11.99, 4.8,
9781787460966, A Bookshop, 6.99, 4.3,
9781787460966, Lots of books, 5.99, 4.4,
9781838770013, A Bookshop, 6.99, 3.8,
9780008305796, The bookseller, 13.99, 4.7,
9780008305796, Lots of books, 16.99, 4.1,
Note: each.csv file is normally 1000's of lines long. There could be 1 to 20 instances of an ISBN. The.csv is not ordered by any column.
My program works as follows (pseudocode):
data will now look like:
ISBN, Shop, Cost, ReviewScore, CostRank, ReviewRank
9780008305796, A Bookshop, 11.99, 4.8, 1, 1
9781787460966, A Bookshop, 6.99, 4.3, 2, 2
9781787460966, Lots of books, 5.99, 4.4, 1, 1
9781838770013, A Bookshop, 6.99, 3.8, 1, 1
9780008305796, The bookseller, 13.99, 4.1, 2, 3
9780008305796, Lots of books, 16.99, 4.3, 3, 2
This program does not depend on the type of data structure the.csv is loaded into. It could be a List, List of Lists, Collection etc.
You /could/ do it in a single pass, the code would look something like so:
Map<String, IsbnData> dataStore = new HashMap();
forEach(row : rows) {
IsbnData datum = dataStore.get(row[0]); //or whatever the index of ISBN is
if(datum == null) {
datum = createIsbnDataFromRow(row);
} else {
datum = updateDatumWithMoreData(datum, row);
}
dataStore.put(row[0], datum);
}
The main benefit of this is that instead of having to deal with String[]
you have nicely structured classes and the code is easier to read.
The code /may/ run faster, but that's probably irrelevant since it's much more likely to run out of memory before the speed matters. (Don't confuse this with the program being slow - it may well be slow, but that is due to reading / parsing the CSV files. The speed gain from passing over the CSV files less times after you've parsed them is negligable).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.