使用 Java 进行 Apache Beam 编码

Question

使用带有 FlatMapElements 和 MapElements 的“into”方法时出现错误。 请帮助我如何解决这个问题，因为我收到错误“The method into(TypeDescriptor) is undefined for the type FlatMapElements”

上面的 TypeDescriptor 没有定义，但不能用任何东西代替它，我是 Apache Beam 的新手。

请帮忙！！！！

下面是源代码：

import org.apache.beam.examples.common.ExampleUtils;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.metrics.Counter;
import org.apache.beam.sdk.metrics.Distribution;
import org.apache.beam.sdk.metrics.Metrics;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.Validation.Required;
import org.apache.beam.sdk.transforms.Count;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.FlatMapElements;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.TypeDescriptors;

/**
 * An example that counts words in Shakespeare and includes Beam best practices.
 *
 * <p>This class, {@link WordCount}, is the second in a series of four successively more detailed
 * 'word count' examples. You may first want to take a look at {@link MinimalWordCount}. After
 * you've looked at this example, then see the {@link DebuggingWordCount} pipeline, for introduction
 * of additional concepts.
 *
 * <p>For a detailed walkthrough of this example, see <a
 * href="https://beam.apache.org/get-started/wordcount-example/">
 * https://beam.apache.org/get-started/wordcount-example/ </a>
 *
 * <p>Basic concepts, also in the MinimalWordCount example: Reading text files; counting a
 * PCollection; writing to text files
 *
 * <p>New Concepts:
 *
 * <pre>
 *   1. Executing a Pipeline both locally and using the selected runner
 *   2. Using ParDo with static DoFns defined out-of-line
 *   3. Building a composite transform
 *   4. Defining your own pipeline options
 * </pre>
 *
 * <p>Concept #1: you can execute this pipeline either locally or using by selecting another runner.
 * These are now command-line options and not hard-coded as they were in the MinimalWordCount
 * example.
 *
 * <p>To change the runner, specify:
 *
 * <pre>{@code
 * --runner=YOUR_SELECTED_RUNNER
 * }</pre>
 *
 * <p>To execute this pipeline, specify a local output file (if using the {@code DirectRunner}) or
 * output prefix on a supported distributed file system.
 *
 * <pre>{@code
 * --output=[YOUR_LOCAL_FILE | YOUR_OUTPUT_PREFIX]
 * }</pre>
 *
 * <p>The input file defaults to a public data set containing the text of of King Lear, by William
 * Shakespeare. You can override it and choose your own input with {@code --inputFile}.
 */
public class BeamPipeline {


    public static void main(String args[]) {
        PipelineOptions options = PipelineOptionsFactory.create();
        Pipeline p = Pipeline.create(options);
        PCollection<String> csvRows = p.apply("Read from CSV",
                TextIO.Read.from("./reviews.csv"));

        // Step 2 - Extract ratings and count them.
        PCollection<KV<String, Long>> ratingsCounts = csvRows
                .apply("Extract Ratings",
                        FlatMapElements.into(TypeDescriptors.strings())
                                .via(csvRow -> Arrays.asList(csvRow.split(",")[1])))
                .apply("Count Ratings", Count.<String>perElement());

        // Step 3 - Write results to CSV
        ratingsCounts
                .apply("FormatResults", MapElements.into(TypeDescriptors.strings())
                        .via((KV<String, Long> ratingsCount) -> ratingsCount.getKey() + " " + ratingsCount.getValue()))
                .apply(TextIO.Write.to("./ratings_results").withSuffix(".csv"));

        // Run the pipeline and wait till it finishes before exiting
        p.run().waitUntilFinish();

    }
  }

Answer 1

更新您的第 2 步如下：

 PCollection<KV<String, Long>> ratingsCounts = csvRows
                .apply("Extract Ratings",
                        FlatMapElements.into(TypeDescriptors.strings())
                                .via((String csvRow) -> Arrays.asList(csvRow.split(",")[1])))
                .apply("Count Ratings", Count.<String>perElement());

使用 Java 进行 Apache Beam 编码

问题描述

1 个解决方案

解决方案1
0 2020-07-16 03:05:40

使用 Java 进行 Apache Beam 编码

问题描述

1 个解决方案

解决方案1 0 2020-07-16 03:05:40

解决方案1
0 2020-07-16 03:05:40