简体   繁体   English

解析CSV文件以填充数据库

[英]Parsing a csv file to populate database

Given I have a csv file such as this 鉴于我有一个这样的CSV文件

str_name,int_points,int_bonus
joe,2,5
Moe,10,15
Carlos,25,60

I can have csv file with x number of columns and y number of rows so i am trying to develop a generic method to parse it and populate data in to dynamodb table. 我可以使用具有x列数和y行数的csv文件,因此我正在尝试开发一种通用方法来对其进行解析并将数据填充到dynamodb表中。

In order to populate the dynamodb table i would do something like this 为了填充dynamodb表,我会做这样的事情

String line = "";
    String cvsSplitBy = ",";

    try (BufferedReader br = new BufferedReader(
                                new InputStreamReader(objectData, "UTF-8"));

        while ((line = br.readLine()) != null) {

            // use comma as separator
            String[] elements = line.split(cvsSplitBy);

            try {
                table.putItem(new Item()
                    .withPrimaryKey("name", elements[0])
                    .withInt("points", elements[1])
                    .withInt("bonus", elements[2])
                    .....);

                System.out.println("PutItem succeeded: " + elements[0]);

            } catch (Exception e) {
                System.err.println("Unable to add user: " + elements);
                System.err.println(e.getMessage());
                break;
            }

        }

    } catch (IOException e) {
        e.printStackTrace();
    }

However i would not always know wether i am inserting a int or a string, it is depenedent on the csv file so i was kinda lost on how to create a generic function which would read the first line of my csv file and take advantage of prefix which indicates if the particular column is a int or a string. 但是我并不总是知道我是否正在插入一个int或字符串,它取决于csv文件,所以我有点迷失于如何创建一个通用函数,该函数将读取我的csv文件的第一行并利用前缀指示特定的列是int还是字符串。

Just store labels (first row) and then while iterating over row values, decide based on label what method to call. 只需存储标签(第一行),然后在遍历行值时,根据标签确定要调用的方法。 If you are not against bringing some external dependencies I advise you to use some external csv reader , eg SuperCsv Using this library you can for example read each row as a Map(label->val) then iterate over entries and based on labels prefix update your db with correct method. 如果您不反对引入某些外部依赖项,建议您使用一些外部csv读取器,例如SuperCsv。使用此库,您可以例如将每一行读取为Map(label-> val),然后遍历条目并基于标签前缀更新您的数据库使用正确的方法。 Or just read header and then do the same reading each row as a list. 或者只是读取标题,然后以相同的方式读取每一行作为列表。

Example : 范例:

This is of course very crude and I would probably refactor it somehow (eg have a list of processors for each column instead of ugly switch) but it shows you the idea 这当然是非常粗糙的,我可能会以某种方式对其进行重构(例如,为每列提供一个处理器列表而不是难看的开关),但是它向您展示了这个想法

        List<String> labels = new ArrayList<>();//store first row here
        List<String> elements = new ArrayList<>();//currently processed line here
        Item item = new Item();
        for (int i = 0; i < elements.size(); i++) {
            String label = labels.get(i);
            switch (getTypePrefix(label)){
                case "int":
                    item = item.withInt(getName(label),elements.get(i));
                    break;
                case "str":
                    item = item.withString(getName(label),elements.get(i));
                    break;
                default:
                    //sth
                    break;
            }
        }
        table.putItem(item);

OK, I can't post this as a comment so I wrote a simple example. 好的,我不能将其发布为评论,所以我写了一个简单的示例。 Note that I'm not familiar with that Amazon API you're using but you should get the idea how I'd go about it (I've basically rewritten your code) 请注意,我对您所使用的Amazon API并不熟悉,但是您应该了解如何使用它(我基本上已经重写了您的代码)

        String line = "";
        String cvsSplitBy = ",";

        try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(objectData, "UTF-8"));

     String[]  colNames = br.readLine().split(cvsSplitBy);      //first line just to get the column names
     while ((line = br.readLine()) != null) {
        String currColumnName = colNames.get(i);
        // use comma as separator
        String[] elements = line.split(cvsSplitBy);
        boolean isInt ;
        for (int i = 0; i < elements.length;i++){

        try {
            try{
            int iVal = new Integer(elements[i]);
            isInt = true;
            }catch(NumberFormatException e){
            //process exception
            isInt = false;
            }
            if(isInt){
            table.putItem.(new Item().withInt(currColumnName,iVal));
            }else{
            table.putItem.(new Item().withString(currColumnName),elements[i])); //don't even know whether there is a withString method
            }

            System.out.println("PutItem succeeded: " + elements[i]);

        } catch (Exception e) {
            System.err.println("Unable to add user: " + elements);
            System.err.println(e.getMessage());
            break;
        }
        }

    }

} catch (IOException e) {
    e.printStackTrace();
}

This example assumes that your first row contains the column names as stored in the DB. 本示例假定您的第一行包含存储在数据库中的列名。 You don't have to write anywhere whether they an int or a String because there is a check in the program (granted this is not the most efficient way to do this and you may write something better, perhaps what Molok has suggested) 您不必在任何地方编写它们,无论它们是int还是String都可以,因为程序中有检查(当然,这不是执行此操作的最有效方法,您可以编写更好的东西,也许是Molok的建议)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM