简体   繁体   English

如何解析具有多个键值对的CSV?

[英]How to parse a CSV with multiple key-value pairs?

I have a CSV in this format: 我有以下格式的CSV:

"Account Name","Full Name","Customer System Name","Sales Rep" “帐户名称”,“全名”,“客户系统名称”,“销售代表”

"0x7a69","Mike Smith","0x7a69","Tim Greaves" “ 0x7a69”,“迈克史密斯”,“ 0x7a69”,“蒂姆·格雷夫斯”

"0x7a69","John Taylor","0x7a69","Brian Anthony" “ 0x7a69”,“约翰·泰勒”,“ 0x7a69”,“布莱恩·安东尼”

"Apple","Steve Jobs","apple","Anthony Michael" “苹果”,“史蒂夫·乔布斯”,“苹果”,“安东尼·迈克尔”

"Apple","Steve Jobs","apple","Brian Anthony" “苹果”,“史蒂夫·乔布斯”,“苹果”,“布莱恩·安东尼”

"Apple","Tim Cook","apple","Tim Greaves" “苹果”,“蒂姆·库克”,“苹果”,“蒂姆·格雷夫斯”

... ...

I would like to parse this CSV (using Java) so that it becomes: 我想解析此CSV(使用Java),使其变为:

"Account Name","Full Name","Customer System Name","Sales Rep" “帐户名称”,“全名”,“客户系统名称”,“销售代表”

"0x7a69","Mike Smith, John Taylor","0x7a69","Tim Greaves, Brian Anthony" “ 0x7a69”,“迈克·史密斯,约翰·泰勒”,“ 0x7a69”,“蒂姆·格雷夫斯,布莱恩·安东尼”

"Apple","Steve Jobs, Tim Cook","apple","Anthony Michael, Brian Anthony, Tim Greaves" “苹果”,“史蒂夫·乔布斯,蒂姆·库克”,“苹果”,“安东尼·迈克尔,布莱恩·安东尼,蒂姆·格雷夫斯”

Essentially I just want to condense the CSV so that there is one entry per account/company name. 本质上,我只想压缩CSV,以便每个帐户/公司名称只有一个条目。

Here is what I have so far: 这是我到目前为止的内容:

String csvFile = "something.csv";
String line = "";
String cvsSplitBy = ",";

List<String> accountList = new ArrayList<String>();
List<String> nameList = new ArrayList<String>();
List<String> systemNameList = new ArrayList<String>();
List<String> salesList = new ArrayList<String>();

try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) 
    {

        while ((line = br.readLine()) != null) {

            // use comma as separator
            String[] csv = line.split(cvsSplitBy);

            accountList.add(csv[0]);
            nameList.add(csv[1]);
            systemNameList.add(csv[2]);
            salesList.add(csv[3]);

        }

So I was thinking of adding them all to their own lists, then looping through all of the lists and comparing the values, but I can't wrap my head around how that would work. 因此,我想将它们全部添加到自己的列表中,然后遍历所有列表并比较值,但是我无法确定其工作原理。 Any tips or words of advice are much appreciated. 任何提示或建议的话都非常感谢。 Thanks! 谢谢!

By analyzing your requirements you can get a better idea of the data structures to use. 通过分析您的需求,您可以更好地了解要使用的数据结构。 Since you need to map keys (account/company) to values (name/rep) I would start with a HashMap . 由于您需要将键(帐户/公司)映射到值(名称/代表),因此我将从HashMap开始。 Since you want to condense the values to remove duplicates you'll probably want to use a Set . 由于您想压缩值以删除重复项,因此可能要使用Set

I would have a Map<Key, Data> with 我将有一个Map<Key, Data>

public class Key {
    private String account;
    private String companyName;

    //Getters/Setters/equals/hashcode
}

public class Data {
    private Key key;
    private Set<String> names = new HashSet<>();
    private Set<String> reps = new Hashset<>();

    public void addName(String name) {
        names.add(name);
    }

    public void addRep(String rep) {
        reps.add(rep);
    }

    //Additional getters/setters/equals/hashcode
}

Once you have your data structures in place, you can do the following to populate the data from your CSV and output it to its own CSV (in pseudocode) 数据结构就绪后,您可以执行以下操作从CSV数据中填充数据并将其输出为自己的CSV(以伪代码)

    Loop each line in CSV
      Build Key from account/company
      Try to get data from Map
      If Data not found
        Create new data with Key and put key -> data mapping in map
      add name and rep to data

    Loop values in map
      Output to CSV

Well, I probably would create a class, let's say "Account", with the attributes "accountName", "fullName", "customerSystemName", "salesRep". 好吧,我可能会创建一个类,例如“ Account”,其属性为“ accountName”,“ fullName”,“ customerSystemName”,“ salesRep”。 Then I would define an empty ArrayList of type Account and then loop over the read lines. 然后,我将定义一个Account类型的空ArrayList,然后遍历读取的行。 And for every read line I just would create a new object of this class, set the corresponding attributes and add the object to the list. 对于每条读取行,我将创建一个此类的新对象,设置相应的属性,然后将该对象添加到列表中。 But before creating the object I would iterate overe the already existing objects in the list to see whether there is one which already has this company name - and if this is the case, then, instead of creating the new object, just reset the salesRep attribute of the old one by adding the new value, separated by comma. 但是创建对象之前 ,我将遍历列表中已经存在的对象,以查看是否已经有该公司名称-如果是这种情况,那么,无需创建新对象,只需重置salesRep属性通过添加新值(以逗号分隔)来表示旧值。

I hope this helps :) 我希望这有帮助 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM