简体   繁体   English

需要使用lambda映射多个文件中的减少行

[英]Need to map reduce lines in multiple files using lambda

I have a number of files that I was to read line by line. 我有许多要逐行读取的文件。 Each line contains a url followed by a timestamp, followed by a number of tags 每行包含一个网址,后跟一个时间戳,后跟多个标签

I have a class called Link that parses each line and provides static methods to get 我有一个名为Link的类,该类解析每一行并提供静态方法来获取

Link::url
Link::timestamp
Link::tags  where this returns a List of tagstrings

The urls can be duplicated in the file along with the tags. 网址可以与标签一起在文件中重复。 I need to read the lines from all the files, collect the tags for each url and eliminate the duplicates Then write the results to an output file in the format url tag1, tag2, tag3 我需要从所有文件中读取行,收集每个url的标记并消除重复项,然后将结果以url tag1,tag2,tag3格式写入输出文件

I am able to do this with Java 7 using map/reduce but cannot figure out how to do this using lambda expression. 我可以使用Java 7使用map / reduce来做到这一点,但无法弄清楚如何使用lambda表达式来做到这一点。 I am told that it can be done in one line of code? 有人告诉我可以用一行代码完成?

This is what I have. 这就是我所拥有的。 I am stuck after the filter. 我被过滤器卡住了。 I think what I want to do is create a map with a key that is the url and a TreeMap where the TreeMap would contain all the unique tags. 我想我想做的是创建一个带有url和TreeMap的键的映射,其中TreeMap将包含所有唯一标记。 I just don't know how to write this any help would be appreciated. 我只是不知道该如何写任何帮助将不胜感激。

public static void tagUnion() throws Exception {   
    Stream<Path> fstream = Files.list(Paths.get(indir));
    fstream.forEach(path -> {
        Stream<String> lines;
        try (Stream<String> entry = Files.lines(path)) {
            entry
            .filter(s -> !s.isEmpty())
            .map(Link::parse)
            .filter(map -> inDate(map.timestamp()));
            // this is where I’m stuck
        } catch (IOException e) {
            e.printStackTrace();
        }
    });
}

I would suggest using Stream::flatMap instead. 我建议改用Stream::flatMap this method maps a each object inthe stream to different stream, all of the same type, and combines them into a single stream you can continue working on. 此方法将流中的每个对象映射到相同类型的不同流,并将它们组合为一个可以继续处理的流。 For example: 例如:

Files.list(somePath)
        .flatMap(Files::lines)
        .filter(s -> !s.isEmpty())
        .map(Link::parse)
        .filter(map -> inDate(map.timestamp()));

Now to do what you are asking requires writing a method that will handle the link and parse it into the line you want it to be. 现在,要执行您要执行的操作,需要编写一种方法来处理链接并将其解析为所需的行。

Finally, to collect a stream of strings into one string with a delimiter( be it newline or comma), there is a method for that: 最后,要使用定界符(换行符或逗号)将字符串流收集为一个字符串,可以使用以下方法:

String csvLine = stream.collect(Collectors.joining(",");

I'm not sure there is enough information here to confidently answer your question, but here is a stab at it anyway. 我不确定这里是否有足够的信息来自信地回答您的问题,但是无论如何,这是一个秘密。

Given that you have something similar to this: 鉴于您有类似以下内容:

@FunctionalInterface
interface IOFunction<T, R>
{
  R apply(T t) throws IOException;

  public static <T, R> Function<T, R> unchecked(IOFunction<T, R> f)
  {
    return v -> {
      try {
        return f.apply(v);
      } catch (IOException e) {
        throw new UncheckedIOException(e);
      }
    };
  }
}

You might be able to get what you want with something like this: 您可能可以通过以下方式获得想要的东西:

  public static Map<String, Set<String>> tagUnion(String indir)
      throws IOException {
    try (Stream<Path> fstream = Files.list(Paths.get(indir))) {
      return fstream
          .flatMap(IOFunction.unchecked(Files::lines))
          .filter(s -> !s.isEmpty())
          .map(Link::parse)
          .filter(link -> inDate(link.timestamp()))
          .collect(Collectors.toMap(Link::url, link -> new TreeSet<>(link.tags())));
    } catch (UncheckedIOException e) {
      throw e.getCause();
    }
  }

The complication here is that Files.lines(...) throws a checked IOException which precludes its use directly in a stream pipeline. 这样做的复杂之处在于Files.lines(...)抛出一个已检查的IOException ,从而使其无法直接在流管道中使用。


OK, based on your comments, you want a groupingBy(...) operation. 好的,根据您的评论,您需要一个groupingBy(...)操作。 It's a little more code to collect the contents of a bunch of List<String> into a Set<String> . 还有一些代码将一堆List<String>的内容收集到Set<String>

  return fstream
      .flatMap(IOFunction.unchecked(Files::lines))
      .filter(s -> !s.isEmpty())
      .map(Link::parse)
      .filter(link -> inDate(link.timestamp()))
      .collect(Collectors.groupingBy(Link::url,
          Collectors.mapping(Link::tags,
              Collector.of(
                  () -> new TreeSet<>(),
                  (s, l) -> s.addAll(l),
                  (s1, s2) -> {
                    s1.addAll(s2);
                    return s1;
                  }))));

For Java 9, this could be simplified to something like: 对于Java 9,可以将其简化为:

  return fstream
      .flatMap(IOFunction.unchecked(Files::lines))
      .filter(s -> !s.isEmpty())
      .map(Link::parse)
      .filter(link -> inDate(link.timestamp()))
      .collect(Collectors.groupingBy(Link::url,
          Collectors.flatMapping(link -> link.tags().stream(), Collectors.toSet())));

Thanks for the help. 谢谢您的帮助。 I was able to solve the problem a different way using a TreeMap 我可以使用TreeMap以不同的方式解决问题

    // create array of files in the directory
    // make sure the files are json files only
    File[] files = new File(indir).listFiles(new FileFilter() {
        @Override
        public boolean accept(File pathname) {
            //System.out.println(pathname.getName());
            return pathname.getName().toLowerCase().endsWith(".json");
        }
    });

    // exit if no json were found
    if (files.length == 0) {
        System.out.println("No JSON files found in directory " + indir);
        System.exit(0);
    }

    // map each line to a String(url), Set(tags)
    Map<String, Set<String>> tagMap = new TreeMap<>();


            lines.filter(s -> !s.isEmpty())
                    .map(Link::parse).forEach(l -> {
                    HashSet hs = new HashSet(l.tags());
                    if (tagMap.containsKey(l.url())) {
                        tagMap.get(l.url()).addAll(hs);
                    } else {
                        tagMap.put(l.url(), hs);
                    }
            });
        }

    }


        // write the output to the specified file
        writeOutput(tagMap, false);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM