[英]Global combine not producing output Apache Beam
I am trying to write an unbounded ping pipeline that takes output from a ping command and parses it to determine some statistics about the RTT (avg/min/max) and for now, just print the results.我正在尝试编写一个无界 ping 管道,该管道从 ping 命令中获取 output 并对其进行解析以确定有关 RTT(平均/最小/最大)的一些统计信息,现在,只需打印结果。
I have already written an unbounded ping source that outputs each line as it comes in. The results are windowed every second for every 5 seconds of pings.我已经编写了一个无界 ping 源,它会在每行输入时输出它。每 5 秒 ping 一次,结果每秒窗口化一次。 The windowed data is fed to a
Combine.globally
call to statefully process the string outputs.窗口化的数据被馈送到一个
Combine.globally
调用以有状态地处理字符串输出。 The problem is that the accumulators are never merged and the output is never extracted.问题是累加器永远不会合并,output 永远不会被提取。 This means that the pipeline never continues past this point.
这意味着管道永远不会超过这一点。 What am I doing wrong here?
我在这里做错了什么?
public class TestPingIPs {
public static void main(String[] args)
{
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline pipeline = Pipeline.create(options);
String destination = "8.8.8.8";
PCollection<PingResult> res =
/*
Run the unbounded ping command. Only the lines where the result of the ping command are returned.
No statistics or first startup lines are returned here.
*/
pipeline.apply("Ping command",
PingCmd.read()
.withPingArguments(PingCmd.PingArguments.create(destination, -1)))
/*
Window the ping command strings into 5 second sliding windows produced every 1 second
*/
.apply("Window strings",
Window.into(SlidingWindows.of(Duration.standardSeconds(5))
.every(Duration.standardSeconds(1))))
/*
Parse and aggregate the strings into a PingResult object using stateful processing.
*/
.apply("Combine the pings",
Combine.globally(new ProcessPings()).withoutDefaults())
/*
Test our output to see what we get here
*/
.apply("Test output",
ParDo.of(new DoFn<PingResult, PingResult>() {
@ProcessElement
public void processElement(ProcessContext c)
{
System.out.println(c.element().getAvgRTT());
System.out.println(c.element().getPacketLoss());
c.output(c.element());
}
}));
pipeline.run().waitUntilFinish();
}
static class ProcessPings extends Combine.CombineFn<String, RttStats, PingResult> {
private long getRTTFromLine(String line){
long rtt = Long.parseLong(line.split("time=")[1].split("ms")[0]);
return rtt;
}
@Override
public RttStats createAccumulator()
{
return new RttStats();
}
@Override
public RttStats addInput(RttStats mutableAccumulator, String input)
{
mutableAccumulator.incTotal();
if (input.contains("unreachable")) {
_unreachableCount.inc();
mutableAccumulator.incPacketLoss();
}
else if (input.contains("General failure")) {
_transmitFailureCount.inc();
mutableAccumulator.incPacketLoss();
}
else if (input.contains("timed out")) {
_timeoutCount.inc();
mutableAccumulator.incPacketLoss();
}
else if (input.contains("could not find")) {
_unknownHostCount.inc();
mutableAccumulator.incPacketLoss();
}
else {
_successfulCount.inc();
mutableAccumulator.add(getRTTFromLine(input));
}
return mutableAccumulator;
}
@Override
public RttStats mergeAccumulators(Iterable<RttStats> accumulators)
{
Iterator<RttStats> iter = accumulators.iterator();
if (!iter.hasNext()){
return createAccumulator();
}
RttStats running = iter.next();
while (iter.hasNext()){
RttStats next = iter.next();
running.addAll(next.getVals());
running.addLostPackets(next.getLostPackets());
}
return running;
}
@Override
public PingResult extractOutput(RttStats stats)
{
stats.calculate();
boolean connected = stats.getPacketLoss() != 1;
return new PingResult(connected, stats.getAvg(), stats.getMin(), stats.getMax(), stats.getPacketLoss());
}
private final Counter _successfulCount = Metrics.counter(ProcessPings.class, "Successful pings");
private final Counter _unknownHostCount = Metrics.counter(ProcessPings.class, "Unknown hosts");
private final Counter _transmitFailureCount = Metrics.counter(ProcessPings.class, "Transmit failures");
private final Counter _timeoutCount = Metrics.counter(ProcessPings.class, "Timeouts");
private final Counter _unreachableCount = Metrics.counter(ProcessPings.class, "Unreachable host");
}
I would guess that there are some issues with the CombineFn
that I wrote, but I can't seem to figure out what's going wrong here!我猜我写的
CombineFn
存在一些问题,但我似乎无法弄清楚这里出了什么问题! I tried following the example here , but there's still something I must be missing.我尝试按照此处的示例进行操作,但是我仍然必须缺少一些东西。
EDIT: I added the ping command implementation below.编辑:我在下面添加了 ping 命令实现。 This is running on a Direct Runner while I test.
这是在我测试时在 Direct Runner 上运行的。
PingCmd.java: PingCmd.java:
public class PingCmd {
public static Read read(){
if (System.getProperty("os.name").startsWith("Windows")) {
return WindowsPingCmd.read();
}
else{
return null;
}
}
WindowsPingCmd.java: WindowsPingCmd.java:
public class WindowsPingCmd extends PingCmd {
private WindowsPingCmd()
{
}
public static PingCmd.Read read()
{
return new WindowsRead.Builder().build();
}
static class PingCheckpointMark implements UnboundedSource.CheckpointMark, Serializable {
@VisibleForTesting
Instant oldestMessageTimestamp = Instant.now();
@VisibleForTesting
transient List<String> outputs = new ArrayList<>();
public PingCheckpointMark()
{
}
public void add(String message, Instant timestamp)
{
if (timestamp.isBefore(oldestMessageTimestamp)) {
oldestMessageTimestamp = timestamp;
}
outputs.add(message);
}
@Override
public void finalizeCheckpoint()
{
oldestMessageTimestamp = Instant.now();
outputs.clear();
}
// set an empty list to messages when deserialize
private void readObject(java.io.ObjectInputStream stream)
throws IOException, ClassNotFoundException
{
stream.defaultReadObject();
outputs = new ArrayList<>();
}
@Override
public boolean equals(@Nullable Object other)
{
if (other instanceof PingCheckpointMark) {
PingCheckpointMark that = (PingCheckpointMark) other;
return Objects.equals(this.oldestMessageTimestamp, that.oldestMessageTimestamp)
&& Objects.deepEquals(this.outputs, that.outputs);
}
else {
return false;
}
}
}
@VisibleForTesting
static class UnboundedPingSource extends UnboundedSource<String, PingCheckpointMark> {
private final WindowsRead spec;
public UnboundedPingSource(WindowsRead spec)
{
this.spec = spec;
}
@Override
public UnboundedReader<String> createReader(
PipelineOptions options, PingCheckpointMark checkpointMark)
{
return new UnboundedPingReader(this, checkpointMark);
}
@Override
public List<UnboundedPingSource> split(int desiredNumSplits, PipelineOptions options)
{
// Don't really need to ever split the ping source, so we should just have one per destination
return Collections.singletonList(new UnboundedPingSource(spec));
}
@Override
public void populateDisplayData(DisplayData.Builder builder)
{
spec.populateDisplayData(builder);
}
@Override
public Coder<PingCheckpointMark> getCheckpointMarkCoder()
{
return SerializableCoder.of(PingCheckpointMark.class);
}
@Override
public Coder<String> getOutputCoder()
{
return StringUtf8Coder.of();
}
}
@VisibleForTesting
static class UnboundedPingReader extends UnboundedSource.UnboundedReader<String> {
private final UnboundedPingSource source;
private String current;
private Instant currentTimestamp;
private final PingCheckpointMark checkpointMark;
private BufferedReader processOutput;
private Process process;
private boolean finishedPings;
private int maxCount = 5;
private static AtomicInteger currCount = new AtomicInteger(0);
public UnboundedPingReader(UnboundedPingSource source, PingCheckpointMark checkpointMark)
{
this.finishedPings = false;
this.source = source;
this.current = null;
if (checkpointMark != null) {
this.checkpointMark = checkpointMark;
}
else {
this.checkpointMark = new PingCheckpointMark();
}
}
@Override
public boolean start() throws IOException
{
WindowsRead spec = source.spec;
String cmd = createCommand(spec.pingConfiguration().getPingCount(), spec.pingConfiguration().getDestination());
try {
ProcessBuilder builder = new ProcessBuilder(cmd.split(" "));
builder.redirectErrorStream(true);
process = builder.start();
processOutput = new BufferedReader(new InputStreamReader(process.getInputStream()));
return advance();
} catch (Exception e) {
throw new IOException(e);
}
}
private String createCommand(int count, String dest){
StringBuilder builder = new StringBuilder("ping");
String countParam = "";
if (count <= 0){
countParam = "-t";
}
else{
countParam += "-n " + count;
}
return builder.append(" ").append(countParam).append(" ").append(dest).toString();
}
@Override
public boolean advance() throws IOException
{
String line = processOutput.readLine();
// Ignore empty/null lines
if (line == null || line.isEmpty()) {
line = processOutput.readLine();
}
// Ignore the 'Pinging <dest> with 32 bytes of data' line
if (line.contains("Pinging " + source.spec.pingConfiguration().getDestination())) {
line = processOutput.readLine();
}
// If the pings have finished, ignore
if (finishedPings) {
return false;
}
// If this is the start of the statistics, the pings are done and we can just exit
if (line.contains("statistics")) {
finishedPings = true;
}
current = line;
currentTimestamp = Instant.now();
checkpointMark.add(current, currentTimestamp);
if (currCount.incrementAndGet() == maxCount){
currCount.set(0);
return false;
}
return true;
}
@Override
public void close() throws IOException
{
if (process != null) {
process.destroy();
if (process.isAlive()) {
process.destroyForcibly();
}
}
}
@Override
public Instant getWatermark()
{
return checkpointMark.oldestMessageTimestamp;
}
@Override
public UnboundedSource.CheckpointMark getCheckpointMark()
{
return checkpointMark;
}
@Override
public String getCurrent()
{
if (current == null) {
throw new NoSuchElementException();
}
return current;
}
@Override
public Instant getCurrentTimestamp()
{
if (current == null) {
throw new NoSuchElementException();
}
return currentTimestamp;
}
@Override
public UnboundedPingSource getCurrentSource()
{
return source;
}
}
public static class WindowsRead extends PingCmd.Read {
private final PingArguments pingConfig;
private WindowsRead(PingArguments pingConfig)
{
this.pingConfig = pingConfig;
}
public Builder builder()
{
return new WindowsRead.Builder(this);
}
PingArguments pingConfiguration()
{
return pingConfig;
}
public WindowsRead withPingArguments(PingArguments configuration)
{
checkArgument(configuration != null, "configuration can not be null");
return builder().setPingArguments(configuration).build();
}
@Override
public PCollection<String> expand(PBegin input)
{
org.apache.beam.sdk.io.Read.Unbounded<String> unbounded =
org.apache.beam.sdk.io.Read.from(new UnboundedPingSource(this));
return input.getPipeline().apply(unbounded);
}
@Override
public void populateDisplayData(DisplayData.Builder builder)
{
super.populateDisplayData(builder);
pingConfiguration().populateDisplayData(builder);
}
static class Builder {
private PingArguments config;
Builder()
{
}
private Builder(WindowsRead source)
{
this.config = source.pingConfiguration();
}
WindowsRead.Builder setPingArguments(PingArguments config)
{
this.config = config;
return this;
}
WindowsRead build()
{
return new WindowsRead(this.config);
}
}
@Override
public int hashCode()
{
return Objects.hash(pingConfig);
}
}
One thing I notice in your code is that advance()
always returns True
.我在您的代码中注意到的一件事是
advance()
总是返回True
。 The watermark only advances on bundle completion, and I think it's runner-dependent whether a runner will ever complete a bundle if advance
ever never returns False
.水印仅在捆绑完成时前进,我认为如果
advance
永远不会返回False
,跑步者是否会完成捆绑取决于跑步者。 You could try returning False
after a bounded amount of time/number of pings.您可以尝试在有限的时间/ping 次数后返回
False
。
You could also consider re-writing this as an SDF .您也可以考虑将其重写为SDF 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.