简体   繁体   English

Java 有效比较两个列表

[英]Java compare two lists efficiently

I need to compare results of two lists coming from two different sources.我需要比较来自两个不同来源的两个列表的结果。

List<MyData> baseList = new ArrayList<>();

在此处输入图像描述

and

List<MyData> externalList = new ArrayList<>();

在此处输入图像描述

I need to compare CFCHash records on both the lists w.r.t the UserACCNUM, If there is any changes in the CDCHash I need to update that particular record in baseList.我需要比较两个列表 w.r.t 和 UserACCNUM 上的 CFCHash 记录,如果 CDCHash 有任何变化,我需要更新 baseList 中的特定记录。

I tried below looping which didn't sound me efficient我尝试了下面的循环,这听起来效率不高

for(MyData ext : externalList) {
  for(MyaData base : baseList) {
      if(ext.getCDCHash().equals(base.getCDCHash()) && ext.getAccNum().equals(base.getAccNum()) {
       // no change
     }
     else { 
       // changes found - need to update
     }
  }
}

Is list.stream() efficient in this case? list.stream() 在这种情况下是否有效? I have nearly 100k records to compare.我有将近 10 万条记录要比较。

How do I achieve this efficiently?我如何有效地实现这一目标?

You can transform your quadratic algorithm into a linear one by creating a fast lookup Map for one of the two lists and then loop the other list while using the lookup to find the corresponding record in the other list by account number.您可以通过为两个列表之一创建快速查找Map来将二次算法转换为线性算法,然后循环另一个列表,同时使用查找按帐号在另一个列表中查找相应的记录。

JS example just because we can't run Java here;) Note that we assume both lists are of the same length for the sake of the example. JS 示例只是因为我们不能在这里运行 Java;)请注意,为了示例,我们假设两个列表的长度相同。

 const listA = [{ hash: 'account1v1', account: 1 }, { hash: 'account2v1', account: 2 }]; const listB = [{ hash: 'account1v1', account: 1 }, { hash: 'account2v2', account: 2 }]; const dirtyRecords = findDirtyRecords(listA, listB); console.log(dirtyRecords); function findDirtyRecords(listA, listB) { const listAMap = new Map(); for (const record of listA) listAMap.set(record.account, record); return listB.filter(r => r.hash.== listAMap.get(r.account);hash); }

A little bit of set theory may be beneficial here, if MyData implements:如果MyData实现:

  • Comparable
  • equals and hashCode equalshashCode

...and you're open to using Google Guava . ...并且您愿意使用Google Guava

If you set up the two lists that you have as Set s instead (and they could be ordered if you really wanted them to be...), then all you would have to do is invoke Sets.difference(baseList, externalList) .如果您将拥有的两个列表设置为Set s(如果您真的希望它们是……,则可以对它们进行排序),那么您所要做的就是调用Sets.difference(baseList, externalList) You could then iterate through that resulting collection of records to update the values you need to in baseList .然后,您可以遍历生成的记录集合以更新baseList中需要的值。

Don't concern yourself with doing this in one fell swoop.不要一口气做这件事。 It's better and more succinct to do this as two separate actions so that it's easier to debug and establish what's going on.将此作为两个单独的操作来执行会更好也更简洁,这样可以更轻松地调试和确定正在发生的事情。

Well first of all, your question might not solve your problem.那么首先,您的问题可能无法解决您的问题。

As I see based on the tables you provided, your hash does change , and the values might change .正如我根据您提供的表格所见,您的 hash确实发生了变化,并且值可能会发生变化 I see that the unique identifier most likely is user acc num .我看到唯一标识符很可能是user acc num

Depending on the source of your data, it might make sense to iterate / paginate over both of your sources ( if they're ordered by some parameter, eg acct num ) and compare just subsets of data.根据您的数据源,对两个源进行迭代/分页(如果它们按某些参数排序,例如 acct num )并仅比较数据子集可能是有意义的。

Let's say, query accounts 1-20 ( or 1-1000 ), get the min/max acct num & then run the same query on the second source of data to get the same accounts .比方说,查询帐户 1-20(或 1-1000),获取最小/最大帐户编号,然后在第二个数据源上运行相同的查询以获取相同的帐户

Then sort & iterate both collections ( try & match the IDs ) and compare values on each line.然后对 collections 进行排序和迭代(尝试匹配 ID)并比较每一行的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM