简体   繁体   English

转换列表 <Map<String, List<String> &gt;&gt; to String [] []

[英]Converting List<Map<String, List<String>>> to String[][]

I have a use case where I scrape some data, and for some records some keys have multiple values. 我有一个用例,我刮一些数据,对于一些记录,一些键有多个值。 The final output I want is CSV, which I have a library for, and it expects a 2-dimensional array. 我想要的最终输出是CSV,我有一个库,它需要一个二维数组。

So my input structure looks like List<TreeMap<String, List<String>>> (I use TreeMap to ensure stable key order), and my output needs to be String[][] . 所以我的输入结构看起来像List<TreeMap<String, List<String>>> (我使用TreeMap来确保稳定的键顺序),我的输出需要是String[][]

I wrote a generic transformation which calculates the number of columns for each key based on max number of values among all records, and leaves empty cells for records that have less than max values, but it turned out more complex than expected. 我编写了一个通用转换,它根据所有记录中的最大值计算每个键的列数,并为小于最大值的记录留下空单元格,但结果比预期的更复杂。

My question is: can it be written in a more concise/effective (but still generic) way? 我的问题是:它可以用更简洁/有效(但仍然是通用的)方式编写吗? Especially using Java 8 streams/lambdas etc.? 特别是使用Java 8流/ lambdas等?

Sample data and my algorithm follows below (not tested beyond sample data yet): 样本数据和我的算法如下所示(尚未测试样本数据):

package org.example.import;

import java.util.*;
import java.util.stream.Collectors;

public class Main {

    public static void main(String[] args) {
        List<TreeMap<String, List<String>>> rows = new ArrayList<>();
        TreeMap<String, List<String>> row1 = new TreeMap<>();
        row1.put("Title", Arrays.asList("Product 1"));
        row1.put("Category", Arrays.asList("Wireless", "Sensor"));
        row1.put("Price",Arrays.asList("20"));
        rows.add(row1);
        TreeMap<String, List<String>> row2 = new TreeMap<>();
        row2.put("Title", Arrays.asList("Product 2"));
        row2.put("Category", Arrays.asList("Sensor"));
        row2.put("Price",Arrays.asList("35"));
        rows.add(row2);
        TreeMap<String, List<String>> row3 = new TreeMap<>();
        row3.put("Title", Arrays.asList("Product 3"));
        row3.put("Price",Arrays.asList("15"));
        rows.add(row3);

        System.out.println("Input:");
        System.out.println(rows);
        System.out.println("Output:");
        System.out.println(Arrays.deepToString(multiValueListsToArray(rows)));
    }

    public static String[][] multiValueListsToArray(List<TreeMap<String, List<String>>> rows)
    {
        Map<String, IntSummaryStatistics> colWidths = rows.
                stream().
                flatMap(m -> m.entrySet().stream()).
                collect(Collectors.groupingBy(e -> e.getKey(), Collectors.summarizingInt(e -> e.getValue().size())));
        Long tableWidth = colWidths.values().stream().mapToLong(IntSummaryStatistics::getMax).sum();
        String[][] array = new String[rows.size()][tableWidth.intValue()];
        Iterator<TreeMap<String, List<String>>> rowIt = rows.iterator(); // iterate rows
        int rowIdx = 0;
        while (rowIt.hasNext())
        {
            TreeMap<String, List<String>> row = rowIt.next();
            Iterator<String> colIt = colWidths.keySet().iterator(); // iterate columns
            int cellIdx = 0;
            while (colIt.hasNext())
            {
                String col = colIt.next();
                long colWidth = colWidths.get(col).getMax();
                for (int i = 0; i < colWidth; i++) // iterate cells within column
                    if (row.containsKey(col) && row.get(col).size() > i)
                        array[rowIdx][cellIdx + i] = row.get(col).get(i);
                cellIdx += colWidth;
            }
            rowIdx++;
        }
        return array;
    }

}

Program output: 节目输出:

Input:
[{Category=[Wireless, Sensor], Price=[20], Title=[Product 1]}, {Category=[Sensor], Price=[35], Title=[Product 2]}, {Price=[15], Title=[Product 3]}]
Output:
[[Wireless, Sensor, 20, Product 1], [Sensor, null, 35, Product 2], [null, null, 15, Product 3]]

As a first step, I wouldn't focus on new Java 8 features, but rather Java 5+ features. 作为第一步,我不会专注于新的Java 8功能,而是Java 5+功能。 Don't deal with Iterator s when you can use for-each. 当你可以使用for-each时,不要处理Iterator Generally, don't iterate over a keySet() to perform a map lookup for every key, as you can iterate over the entrySet() not requiring any lookup. 通常,不要遍历keySet()来为每个键执行映射查找,因为您可以迭代entrySet()而不需要任何查找。 Also, don't ask for an IntSummaryStatistics when you're only interested in the maximum value. 此外,当您只对最大值感兴趣时,不要求IntSummaryStatistics And don't iterate over the bigger of two data structures, just to recheck that you're not beyond the smaller one in each iteration. 并且不要迭代两个数据结构中较大的一个,只是为了重新检查你在每次迭代中是否超出了较小的数据结构。

Map<String, Integer> colWidths = rows.
        stream().
        flatMap(m -> m.entrySet().stream()).
        collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue().size(), Integer::max));
int tableWidth = colWidths.values().stream().mapToInt(Integer::intValue).sum();
String[][] array = new String[rows.size()][tableWidth];

int rowIdx = 0;
for(TreeMap<String, List<String>> row: rows) {
    int cellIdx = 0;
    for(Map.Entry<String,Integer> e: colWidths.entrySet()) {
        String col = e.getKey();
        List<String> cells = row.get(col);
        int index = cellIdx;
        if(cells != null) for(String s: cells) array[rowIdx][index++]=s;
        cellIdx += colWidths.get(col);
    }
    rowIdx++;
}
return array;

We can simplify the loop further by using a map to column positions rather than widths: 我们可以通过使用映射到列位置而不是宽度来进一步简化循环:

Map<String, Integer> colPositions = rows.
        stream().
        flatMap(m -> m.entrySet().stream()).
        collect(Collectors.toMap(e -> e.getKey(),
                                 e -> e.getValue().size(), Integer::max, TreeMap::new));
int tableWidth = 0;
for(Map.Entry<String,Integer> e: colPositions.entrySet())
    tableWidth += e.setValue(tableWidth);

String[][] array = new String[rows.size()][tableWidth];

int rowIdx = 0;
for(Map<String, List<String>> row: rows) {
    for(Map.Entry<String,List<String>> e: row.entrySet()) {
        int index = colPositions.get(e.getKey());
        for(String s: e.getValue()) array[rowIdx][index++]=s;
    }
    rowIdx++;
}
return array;

A header array can be prepended with the following change: 标头数组可以预先添加以下更改:

Map<String, Integer> colPositions = rows.stream()
    .flatMap(m -> m.entrySet().stream())
    .collect(Collectors.toMap(e -> e.getKey(), e -> e.getValue().size(),
                              Integer::max, TreeMap::new));
String[] header = colPositions.entrySet().stream()
    .flatMap(e -> Collections.nCopies(e.getValue(), e.getKey()).stream())
    .toArray(String[]::new);
int tableWidth = 0;
for(Map.Entry<String,Integer> e: colPositions.entrySet())
    tableWidth += e.setValue(tableWidth);

String[][] array = new String[rows.size()+1][tableWidth];
array[0] = header;

int rowIdx = 1;
for(Map<String, List<String>> row: rows) {
    for(Map.Entry<String,List<String>> e: row.entrySet()) {
        int index = colPositions.get(e.getKey());
        for(String s: e.getValue()) array[rowIdx][index++]=s;
    }
    rowIdx++;
}
return array;

This is quite concise way to do it using some features. 这是使用一些功能来完成它的非常简洁的方法。

This solution assumes that only the Category data is dynamic , whereas you will have always only one price and one product name. 此解决方案假定只有类别数据是动态的 ,而您始终只有一个价格和一个产品名称。

Considering you have the initial data 考虑到你有初始数据

// your initial complex data list 
List<Map<String, List<String>>> initialList = new ArrayList<>();

you can do 你可以做

// values holder before final conversion
final List<List<String>> tempValues = new ArrayList<>();
initialList.forEach( map -> {
    // discard the keys, we do not need them... so only pack the data and put in a temporary array
    tempValues.add(new ArrayList<String>() {{
        map.forEach((key, value) -> addAll(value));          // foreach (string, list) : Map<String, List<String>>
    }});
});
// get the biggest data list; in our case, the one that contains most categories...
// this is going to be the final data size
final int maxSize = tempValues.stream().max(Comparator.comparingInt(List::size)).get().size();
// now we finally know the data size
final String[][] finalValues = new String[initialList.size()][maxSize];
// now it's time to uniform the bundle data size and shift the elements if necessary

// can't use streams/lambda as I need to keep an iteration counter
for (int i = 0; i < tempValues.size(); i++) {
    final List<String> tempEntry = tempValues.get(i);
    if (tempEntry.size() == maxSize) {
        finalValues[i] = tempEntry.toArray(finalValues[i]);
        continue;
    }
    final String[] s = new String[maxSize];
    // same shifting game as before
    final int delta = maxSize - tempEntry.size();
    for (int j = 0; j < maxSize; j++) {
        if (j < delta) continue;
        s[j] = tempEntry.get(j - delta);
    }
    finalValues[i] = s;
}

and that's it... 就是这样......


You can fill and test the data with this method below (I have added some more categories...) 您可以使用以下方法填写并测试数据(我添加了更多类别......)

static void initData(List<Map<String, List<String>>> l) {
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); }});
        put("Price", new ArrayList<String>() {{ add("20"); }});
        put("Title", new ArrayList<String>() {{ add("Product 1"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Sensor"); }});
        put("Price", new ArrayList<String>() {{ add("35"); }});
        put("Title", new ArrayList<String>() {{ add("Product 2"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Price", new ArrayList<String>() {{ add("15"); }});
        put("Title", new ArrayList<String>() {{ add("Product 3"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); add("Category14"); }});
        put("Price", new ArrayList<String>() {{ add("15"); }});
        put("Title", new ArrayList<String>() {{ add("Product 3"); }});
    }});
    l.add(new TreeMap<String, List<String>>() {{
        put("Category", new ArrayList<String>() {{ add("Wireless"); add("Sensor"); add("Category541"); add("SomeCategory");}});
        put("Price", new ArrayList<String>() {{ add("15"); }});
        put("Title", new ArrayList<String>() {{ add("Product 3"); }});
    }});
}

I'd still say, the accepted answer looks less computationally expansive, but you wanted to see some Java 8... 我仍然会说,接受的答案在计算上看起来不那么广泛,但你想看到一些Java 8 ......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM