繁体   English   中英

使用 Java 进行 Apache Beam 编码

[英]Apache Beam Coding with Java

使用带有 FlatMapElements 和 MapElements 的“into”方法时出现错误。 请帮助我如何解决这个问题,因为我收到错误“The method into(TypeDescriptor) is undefined for the type FlatMapElements”

上面的 TypeDescriptor 没有定义,但不能用任何东西代替它,我是 Apache Beam 的新手。

请帮忙!!!!

下面是源代码

import org.apache.beam.examples.common.ExampleUtils;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.metrics.Counter;
import org.apache.beam.sdk.metrics.Distribution;
import org.apache.beam.sdk.metrics.Metrics;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.Validation.Required;
import org.apache.beam.sdk.transforms.Count;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.FlatMapElements;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.PTransform;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.SimpleFunction;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.TypeDescriptors;

/**
 * An example that counts words in Shakespeare and includes Beam best practices.
 *
 * <p>This class, {@link WordCount}, is the second in a series of four successively more detailed
 * 'word count' examples. You may first want to take a look at {@link MinimalWordCount}. After
 * you've looked at this example, then see the {@link DebuggingWordCount} pipeline, for introduction
 * of additional concepts.
 *
 * <p>For a detailed walkthrough of this example, see <a
 * href="https://beam.apache.org/get-started/wordcount-example/">
 * https://beam.apache.org/get-started/wordcount-example/ </a>
 *
 * <p>Basic concepts, also in the MinimalWordCount example: Reading text files; counting a
 * PCollection; writing to text files
 *
 * <p>New Concepts:
 *
 * <pre>
 *   1. Executing a Pipeline both locally and using the selected runner
 *   2. Using ParDo with static DoFns defined out-of-line
 *   3. Building a composite transform
 *   4. Defining your own pipeline options
 * </pre>
 *
 * <p>Concept #1: you can execute this pipeline either locally or using by selecting another runner.
 * These are now command-line options and not hard-coded as they were in the MinimalWordCount
 * example.
 *
 * <p>To change the runner, specify:
 *
 * <pre>{@code
 * --runner=YOUR_SELECTED_RUNNER
 * }</pre>
 *
 * <p>To execute this pipeline, specify a local output file (if using the {@code DirectRunner}) or
 * output prefix on a supported distributed file system.
 *
 * <pre>{@code
 * --output=[YOUR_LOCAL_FILE | YOUR_OUTPUT_PREFIX]
 * }</pre>
 *
 * <p>The input file defaults to a public data set containing the text of of King Lear, by William
 * Shakespeare. You can override it and choose your own input with {@code --inputFile}.
 */
public class BeamPipeline {


    public static void main(String args[]) {
        PipelineOptions options = PipelineOptionsFactory.create();
        Pipeline p = Pipeline.create(options);
        PCollection<String> csvRows = p.apply("Read from CSV",
                TextIO.Read.from("./reviews.csv"));

        // Step 2 - Extract ratings and count them.
        PCollection<KV<String, Long>> ratingsCounts = csvRows
                .apply("Extract Ratings",
                        FlatMapElements.into(TypeDescriptors.strings())
                                .via(csvRow -> Arrays.asList(csvRow.split(",")[1])))
                .apply("Count Ratings", Count.<String>perElement());

        // Step 3 - Write results to CSV
        ratingsCounts
                .apply("FormatResults", MapElements.into(TypeDescriptors.strings())
                        .via((KV<String, Long> ratingsCount) -> ratingsCount.getKey() + " " + ratingsCount.getValue()))
                .apply(TextIO.Write.to("./ratings_results").withSuffix(".csv"));

        // Run the pipeline and wait till it finishes before exiting
        p.run().waitUntilFinish();

    }
  }

更新您的第 2 步如下:

 PCollection<KV<String, Long>> ratingsCounts = csvRows
                .apply("Extract Ratings",
                        FlatMapElements.into(TypeDescriptors.strings())
                                .via((String csvRow) -> Arrays.asList(csvRow.split(",")[1])))
                .apply("Count Ratings", Count.<String>perElement());

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM