繁体   English   中英

如何比较自定义对象的列表/映射,逐个字段以使用 Java 8 或更多以通用方式为非常大的数据集创建不匹配报告?

[英]How to compare the list/map of custom objects, field by field to create mismatch report for very big data set in generic way using Java 8 or more?

我一直在比较 Java 中 2 个不同数据库源之间的数据。由于其他一些挑战,我无法直接在 DB 中进行比较。

  • 我有 50 个表要比较。
  • 表数将从 10k 到 500k 不等。 (需要高效的算法)
  • 每个表的列数和字段名也会不同(当然)

我使用 for 循环编写了以下代码,其限制如下:

  1. 由于某些表的数据量可能很大,因此 for 循环解决方案效率不高。
  2. 每个表的列数会有所不同,因此我编写的逻辑不适用于所有表,我需要对不同的表重复它。 大量样板代码。
  3. 假设任何新列被添加到某个表,比较逻辑也需要更新

我的要求:

  1. 我想编写一个有效的代码来逐个查找所提供的自定义对象列表的字段不匹配报告。
  2. 比较代码应该能够比较任何类型的自定义 object 列表。 (不知道该怎么做)
  3. 能够通过引用一些包含所有表的列列表的属性文件来创建表 object POJO。
public void loadDummyTableObjects() {
        table1DataList =
                Arrays.asList(new TestTable1("1","1","One","Blue"),
                        new TestTable1("2","2","Two","Red"),
                        new TestTable1("3","3","Three","Black"),
                        new TestTable1("4","4","Four","Green"),
                        new TestTable1("5","5","Five","White"));

        table2DataList =
                Arrays.asList(new TestTable2("1","1","One","Blue"),
                        new TestTable2("2","2","Two","Red1"),
                        new TestTable2("3","3","Three","Black"),
                        new TestTable2("4","4","Four","Green"),
                        new TestTable2("5","5","Two","White"));
    }

   public void compareDataWithForLoop() {
        loadDummyTableObjects();
        List<MismatchReport> mismatchReport = new ArrayList<>();
        for (TestTable1 t1Row: table1DataList) {
            for (TestTable2 t2Row: table2DataList) {
                if (t1Row.getId().equals(t2Row.getId())) {
                    if (!(t1Row.getColumn1().equals(t2Row.getColumn1()))) {
                        MismatchReport result = getMismatchReport("Table1", "Column1", t1Row.getColumn1(), t2Row.getColumn1());
                        mismatchReport.add(result);
                    }
                    if (!(t1Row.getColumn2().equals(t2Row.getColumn2()))) {
                        MismatchReport result = getMismatchReport("Table1", "Column2", t1Row.getColumn2(), t2Row.getColumn2());
                        mismatchReport.add(result);
                    }
                    if (!(t1Row.getColumn3().equals(t2Row.getColumn3()))) {
                        MismatchReport result = getMismatchReport("Table1", "Column3", t1Row.getColumn3(), t2Row.getColumn3());
                        mismatchReport.add(result);
                    }
                }
            }
        }
        System.out.println(mismatchReport);
    }

    private static MismatchReport getMismatchReport(String tableNme, String Db1Table1Column1, String t1Row, String t2Row) {
        MismatchReport result = new MismatchReport();
        result.setTableNme(tableNme);
        result.setColumnNme(Db1Table1Column1);
        result.setDb1Value(t1Row);
        result.setDb2Value(t2Row);
        return result;
    }

    public static void main(String[] args) {
        DataComparatorService service = new DataComparatorService();
        service.compareDataWithForLoop();
    }

每个表比较的 output 格式应该相同。 结果应包含字段(TableName、ColumnName、Db1Value、Db2Value),以了解发现差异的列和不匹配的值。 以上代码的output为:


[MismatchReport{tableNme='Table1', columnNme='Column3', db1Value='Red', db2Value='Red1'}, 
MismatchReport{tableNme='Table1', columnNme='Column2', db1Value='Five', db2Value='Two'}]

关于如何实现上述要求的任何线索都将非常有帮助。

如果我是你,我不会重新发明轮子,而是使用 JaVers 等第三方库。

JaVers 文档

杰弗斯 GitHub

贾弗斯 Maven

它是一个功能强大但轻量级的库。 它可以做更多的事情,但您也可以将它仅用作 object diff 工具。 作为起点,我采用了您的一些示例输入来展示如何将其应用于您的用例。

import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;

import org.javers.core.Javers;
import org.javers.core.JaversBuilder;
import org.javers.core.diff.Diff;

import lombok.AllArgsConstructor;
import lombok.Getter;

public final class Example {

    public static void main(String[] args) {

        //Just copied your sample input but used only one custom class as the second is not really needed
        List<TestTable1> dataDB1 = Arrays.asList(new TestTable1("1","1","One","Blue"),
                      new TestTable1("2","2","Two","Red"),
                      new TestTable1("3","3","Three","Black"),
                      new TestTable1("4","4","Four","Green"),
                      new TestTable1("5","5","Five","White"));

        List<TestTable1> dataDB2 = Arrays.asList(new TestTable1("1","1","One","Blue"),
                      new TestTable1("2","2","Two","Red1"),
                      new TestTable1("3","3","Three","Black"),
                      new TestTable1("4","4","Four","Green"),
                      new TestTable1("5","5","Two","White"));

        //create a map from your input for a faster access of objects by id
        Map<String, TestTable1> db1Map = dataDB1.stream()
                                           .collect(Collectors.toMap(TestTable1::getId, Function.identity()));
        Map<String, TestTable1> db2Map = dataDB2.stream()
                                           .collect(Collectors.toMap(TestTable1::getId, Function.identity()));

        // do your comparison using JaVers
        Javers javers = JaversBuilder.javers().build();

        db1Map.keySet().forEach(key -> {
            Diff diff = javers.compare(db1Map.get(key), db2Map.get(key));
            if (diff.hasChanges()){
                System.out.println("Changes for id: " + key);
                System.out.println(diff.prettyPrint());
                System.out.println("********************************************************");
                System.out.println();
            }
        });
    }

    // a simple POJO for your data
    @AllArgsConstructor
    @Getter
    public static class TestTable1 {
        String id;
        String column1;
        String column2;
        String column3;
    }
}

Output:

Changes for id: 2
Diff:
* changes on com.mycompany.Example$TestTable1/ :
  - 'column3' changed: 'Red' -> 'Red1'

********************************************************

Changes for id: 5
Diff:
* changes on com.mycompany.Example$TestTable1/ :
  - 'column2' changed: 'Five' -> 'Two'

********************************************************

我只是使用prettyPrint得到一个标准的 output 但你可以配置它来满足你的需要

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM