简体   繁体   English

如何使用基于另一个列表的lambda从列表中删除元素

[英]How to remove elements from a list with lambda based on another list

I have List of file Paths: . 我有文件路径列表:。

List<Path> filePaths; //e.g. [src\test\resources\file\15\54\54_exampleFile.pdf]

54 above refers to file ID 54以上是指文件ID

I then obtain a Set of String Ids which my application can handle as follows: 然后,我获得了一Set String ID,我的应用程序可以处理如下:

Set<String> acceptedIds = connection.getAcceptedIDs(); //e.g. elements [64, 101, 33]

How can I use Java 8 lambdas to filter out all elements in filePaths that do not contain any of the acceptable Ids that are contained in acceptedIds collection Set. 如何使用Java 8 lambdas filterfilePaths中不包含acceptedIds集合集中包含的任何可接受ID的所有元素。

In other words, I would like to retain in filePaths only the paths that have ids which are in acceptedIds set. 换句话说,我想在保留filePaths只能在具有IDS这些都是路径acceptedIds设置。 For example, 54 is not in the above list so is removed. 例如,54不在上面的列表中,因此被删除。

filePaths.stream().filter(...).collect(Collectors.toList());

The most efficient way is to extract the ID from the path, then attempt to find it in the Set, making each filter execute in constant time, ie O(1) giving an overall O(n) , where n is the number of paths: 最有效的方法是从路径中提取ID,然后尝试在Set中找到它,使每个过滤器在恒定时间内执行,即O(1)给出总体O(n) ,其中n是路径数:

filePaths.stream()
  .filter(p -> acceptedIds.contains(p.getParent().getFileName().toString()))
  .collect(Collectors.toList());

If the reverse approach is done, where each acceptedIds is searched for in the path (as in other answers), each filter is O(m*k) , where m is the number of acceptedIds and k is the average Path length, giving an overall O(n * m * k) , which will perform very poorly for even moderate sizes of collections. 如果完成相反的方法,在路径中搜索每个acceptedIds (如在其他答案中),每个过滤器是O(m*k) ,其中macceptedIds的数量, k是平均路径长度,给出整体O(n * m * k) ,即使是中等大小的收藏品也会表现很差。

You could write: 你可以写:

filePaths.stream()
         .filter(p -> acceptedIds.stream().anyMatch(id -> p.toString().contains(id)))
         .collect(toList());

This filters each path such that at least one of the acceptedIds is contained in the string representation of the path. 这过滤每个路径,使得至少一个acceptedIds包含在路径的字符串表示中。 You might want to implement something better than contains here, depending on your use-case (matching the beginning of the filename for example). 您可能希望实现比此处contains更好的contains ,具体取决于您的用例(例如,匹配文件名的开头)。

anyMatch is an operation that determines if at least one element matches the given predicate. anyMatch是一个操作,用于确定是否至少有一个元素与给定的谓词匹配。

Note that this answer does not make any assumption about the path to filter out elements. 请注意,此答案不会对过滤掉元素的路径做出任何假设。 If you can safely say that in each path, the parent directory is named with the id, you should definitely go with @Bohemian answer, for performance reason. 如果您可以安全地说在每个路径中,父目录都以id命名,那么出于性能原因,您肯定应该使用@Bohemian答案。

Like so: 像这样:

List removeMissing(List l1, List l2) {
    List ret = l1.stream()
        .filter(o -> l2.contains(o)) //Keep if object o satisfies the condition "l2 contains a reference to this object"
        .collect(Collectors.toList());
    return ret;
}

If your file name structure is constant, I'd use a regex first to extract the number, and then will check if it is among the desired ids. 如果您的文件名结构是常量,我首先使用正则表达式提取数字,然后检查它是否是所需的ID。

final Set<String> acceptedIds = ...
// Matches the number of the file, concluded with the underscore
final Pattern extractor = Pattern.compile("\.*(?<number>\d+)_")
filePaths.stream().filter( path -> {
    final Matcher m = extractor
        .matcher(path.getFileName().toString());
    m.find();
    return acceptedIds.contains(m.group("number"));
})
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM