简体   繁体   中英

Map Multiple CSV to single POJO

I have many CSV files with different column header. Currently I am reading those csv files and map them to different POJO classes based on their column header. So some of the CSV files have around 100 column headers which makes difficult to create a POJO class.

So Is there any technique where I can use single pojo, so when reading those csv files can map to a single POJO class or I should read the CSV file line by line and parse accordingly or I should create the POJO during runtime(javaassist)?

If I understand your problem correctly, you can use uniVocity-parsers to process this and get the data in a map:

//First create a configuration object - there are many options 
//available and the tutorial has a lot of examples
CsvParserSettings settings = new CsvParserSettings();
settings.setHeaderExtractionEnabled(true);

CsvParser parser = new CsvParser(settings);
parser.beginParsing(new File("/path/to/your.csv"));

// you can also apply some transformations:
// NULL year should become 0000
parser.getRecordMetadata().setDefaultValueOfColumns("0000", "Year");

// decimal separator in prices will be replaced by comma
parser.getRecordMetadata().convertFields(Conversions.replace("\\.00", ",00")).set("Price");

Record record;
while ((record = parser.parseNextRecord()) != null) {
     Map<String, String> map = record.toFieldMap(/*you can pass a list of column names of interest here*/);
     //for performance, you can also reuse the map and call record.fillFieldMap(map);
}

Or you can even parse the file and get beans of different types in a single step. Here's how you do it:

CsvParserSettings settings = new CsvParserSettings();

//Create a row processor to process input rows. In this case we want
//multiple instances of different classes:
MultiBeanListProcessor processor = new MultiBeanListProcessor(TestBean.class, AmountBean.class, QuantityBean.class);

// we also need to grab the headers from our input file
settings.setHeaderExtractionEnabled(true);

// configure the parser to use the MultiBeanProcessor
settings.setRowProcessor(processor);

// create the parser and run
CsvParser parser = new CsvParser(settings);

parser.parse(new File("/path/to/your.csv"));

// get the beans:
List<TestBean> testBeans = processor.getBeans(TestBean.class);
List<AmountBean> amountBeans = processor.getBeans(AmountBean.class);
List<QuantityBean> quantityBeans = processor.getBeans(QuantityBean.class);

See an example here and here

If your data is too big and you can't hold everything in memory, you can stream the input row by row by using the MultiBeanRowProcessor instead. The method rowProcessed(Map<Class<?>, Object> row, ParsingContext context) will give you a map of instances created for each class in the current row. Inside the method, just call:

AmountBean amountBean = (AmountBean) row.get(AmountBean.class);
QuantityBean quantityBean = (QuantityBean) row.get(QuantityBean.class);
...

//perform something with the instances parsed in a row.

Hope this helps.

Disclaimer: I'm the author of this library. It's open-source and free (Apache 2.0 license)

To me, creating a POJO class is not a good idea in this case. As neither number of columns nor number of files are constant. Therefore, it is better to use something more dynamic for which you do not have to change your code to a great extent just to support more columns OR files.

I would go for a List (Or Map ) of Map List<Map<>> for a given csv file. Where each map represents a row in your csv file with key as column name.

You can easily extend it to multiple csv files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM