[英]Global combine not producing output Apache Beam
我正在尝试编写一个无界 ping 管道,该管道从 ping 命令中获取 output 并对其进行解析以确定有关 RTT(平均/最小/最大)的一些统计信息,现在,只需打印结果。
我已经编写了一个无界 ping 源,它会在每行输入时输出它。每 5 秒 ping 一次,结果每秒窗口化一次。 窗口化的数据被馈送到一个Combine.globally
调用以有状态地处理字符串输出。 问题是累加器永远不会合并,output 永远不会被提取。 这意味着管道永远不会超过这一点。 我在这里做错了什么?
public class TestPingIPs {
public static void main(String[] args)
{
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline pipeline = Pipeline.create(options);
String destination = "8.8.8.8";
PCollection<PingResult> res =
/*
Run the unbounded ping command. Only the lines where the result of the ping command are returned.
No statistics or first startup lines are returned here.
*/
pipeline.apply("Ping command",
PingCmd.read()
.withPingArguments(PingCmd.PingArguments.create(destination, -1)))
/*
Window the ping command strings into 5 second sliding windows produced every 1 second
*/
.apply("Window strings",
Window.into(SlidingWindows.of(Duration.standardSeconds(5))
.every(Duration.standardSeconds(1))))
/*
Parse and aggregate the strings into a PingResult object using stateful processing.
*/
.apply("Combine the pings",
Combine.globally(new ProcessPings()).withoutDefaults())
/*
Test our output to see what we get here
*/
.apply("Test output",
ParDo.of(new DoFn<PingResult, PingResult>() {
@ProcessElement
public void processElement(ProcessContext c)
{
System.out.println(c.element().getAvgRTT());
System.out.println(c.element().getPacketLoss());
c.output(c.element());
}
}));
pipeline.run().waitUntilFinish();
}
static class ProcessPings extends Combine.CombineFn<String, RttStats, PingResult> {
private long getRTTFromLine(String line){
long rtt = Long.parseLong(line.split("time=")[1].split("ms")[0]);
return rtt;
}
@Override
public RttStats createAccumulator()
{
return new RttStats();
}
@Override
public RttStats addInput(RttStats mutableAccumulator, String input)
{
mutableAccumulator.incTotal();
if (input.contains("unreachable")) {
_unreachableCount.inc();
mutableAccumulator.incPacketLoss();
}
else if (input.contains("General failure")) {
_transmitFailureCount.inc();
mutableAccumulator.incPacketLoss();
}
else if (input.contains("timed out")) {
_timeoutCount.inc();
mutableAccumulator.incPacketLoss();
}
else if (input.contains("could not find")) {
_unknownHostCount.inc();
mutableAccumulator.incPacketLoss();
}
else {
_successfulCount.inc();
mutableAccumulator.add(getRTTFromLine(input));
}
return mutableAccumulator;
}
@Override
public RttStats mergeAccumulators(Iterable<RttStats> accumulators)
{
Iterator<RttStats> iter = accumulators.iterator();
if (!iter.hasNext()){
return createAccumulator();
}
RttStats running = iter.next();
while (iter.hasNext()){
RttStats next = iter.next();
running.addAll(next.getVals());
running.addLostPackets(next.getLostPackets());
}
return running;
}
@Override
public PingResult extractOutput(RttStats stats)
{
stats.calculate();
boolean connected = stats.getPacketLoss() != 1;
return new PingResult(connected, stats.getAvg(), stats.getMin(), stats.getMax(), stats.getPacketLoss());
}
private final Counter _successfulCount = Metrics.counter(ProcessPings.class, "Successful pings");
private final Counter _unknownHostCount = Metrics.counter(ProcessPings.class, "Unknown hosts");
private final Counter _transmitFailureCount = Metrics.counter(ProcessPings.class, "Transmit failures");
private final Counter _timeoutCount = Metrics.counter(ProcessPings.class, "Timeouts");
private final Counter _unreachableCount = Metrics.counter(ProcessPings.class, "Unreachable host");
}
我猜我写的CombineFn
存在一些问题,但我似乎无法弄清楚这里出了什么问题! 我尝试按照此处的示例进行操作,但是我仍然必须缺少一些东西。
编辑:我在下面添加了 ping 命令实现。 这是在我测试时在 Direct Runner 上运行的。
PingCmd.java:
public class PingCmd {
public static Read read(){
if (System.getProperty("os.name").startsWith("Windows")) {
return WindowsPingCmd.read();
}
else{
return null;
}
}
WindowsPingCmd.java:
public class WindowsPingCmd extends PingCmd {
private WindowsPingCmd()
{
}
public static PingCmd.Read read()
{
return new WindowsRead.Builder().build();
}
static class PingCheckpointMark implements UnboundedSource.CheckpointMark, Serializable {
@VisibleForTesting
Instant oldestMessageTimestamp = Instant.now();
@VisibleForTesting
transient List<String> outputs = new ArrayList<>();
public PingCheckpointMark()
{
}
public void add(String message, Instant timestamp)
{
if (timestamp.isBefore(oldestMessageTimestamp)) {
oldestMessageTimestamp = timestamp;
}
outputs.add(message);
}
@Override
public void finalizeCheckpoint()
{
oldestMessageTimestamp = Instant.now();
outputs.clear();
}
// set an empty list to messages when deserialize
private void readObject(java.io.ObjectInputStream stream)
throws IOException, ClassNotFoundException
{
stream.defaultReadObject();
outputs = new ArrayList<>();
}
@Override
public boolean equals(@Nullable Object other)
{
if (other instanceof PingCheckpointMark) {
PingCheckpointMark that = (PingCheckpointMark) other;
return Objects.equals(this.oldestMessageTimestamp, that.oldestMessageTimestamp)
&& Objects.deepEquals(this.outputs, that.outputs);
}
else {
return false;
}
}
}
@VisibleForTesting
static class UnboundedPingSource extends UnboundedSource<String, PingCheckpointMark> {
private final WindowsRead spec;
public UnboundedPingSource(WindowsRead spec)
{
this.spec = spec;
}
@Override
public UnboundedReader<String> createReader(
PipelineOptions options, PingCheckpointMark checkpointMark)
{
return new UnboundedPingReader(this, checkpointMark);
}
@Override
public List<UnboundedPingSource> split(int desiredNumSplits, PipelineOptions options)
{
// Don't really need to ever split the ping source, so we should just have one per destination
return Collections.singletonList(new UnboundedPingSource(spec));
}
@Override
public void populateDisplayData(DisplayData.Builder builder)
{
spec.populateDisplayData(builder);
}
@Override
public Coder<PingCheckpointMark> getCheckpointMarkCoder()
{
return SerializableCoder.of(PingCheckpointMark.class);
}
@Override
public Coder<String> getOutputCoder()
{
return StringUtf8Coder.of();
}
}
@VisibleForTesting
static class UnboundedPingReader extends UnboundedSource.UnboundedReader<String> {
private final UnboundedPingSource source;
private String current;
private Instant currentTimestamp;
private final PingCheckpointMark checkpointMark;
private BufferedReader processOutput;
private Process process;
private boolean finishedPings;
private int maxCount = 5;
private static AtomicInteger currCount = new AtomicInteger(0);
public UnboundedPingReader(UnboundedPingSource source, PingCheckpointMark checkpointMark)
{
this.finishedPings = false;
this.source = source;
this.current = null;
if (checkpointMark != null) {
this.checkpointMark = checkpointMark;
}
else {
this.checkpointMark = new PingCheckpointMark();
}
}
@Override
public boolean start() throws IOException
{
WindowsRead spec = source.spec;
String cmd = createCommand(spec.pingConfiguration().getPingCount(), spec.pingConfiguration().getDestination());
try {
ProcessBuilder builder = new ProcessBuilder(cmd.split(" "));
builder.redirectErrorStream(true);
process = builder.start();
processOutput = new BufferedReader(new InputStreamReader(process.getInputStream()));
return advance();
} catch (Exception e) {
throw new IOException(e);
}
}
private String createCommand(int count, String dest){
StringBuilder builder = new StringBuilder("ping");
String countParam = "";
if (count <= 0){
countParam = "-t";
}
else{
countParam += "-n " + count;
}
return builder.append(" ").append(countParam).append(" ").append(dest).toString();
}
@Override
public boolean advance() throws IOException
{
String line = processOutput.readLine();
// Ignore empty/null lines
if (line == null || line.isEmpty()) {
line = processOutput.readLine();
}
// Ignore the 'Pinging <dest> with 32 bytes of data' line
if (line.contains("Pinging " + source.spec.pingConfiguration().getDestination())) {
line = processOutput.readLine();
}
// If the pings have finished, ignore
if (finishedPings) {
return false;
}
// If this is the start of the statistics, the pings are done and we can just exit
if (line.contains("statistics")) {
finishedPings = true;
}
current = line;
currentTimestamp = Instant.now();
checkpointMark.add(current, currentTimestamp);
if (currCount.incrementAndGet() == maxCount){
currCount.set(0);
return false;
}
return true;
}
@Override
public void close() throws IOException
{
if (process != null) {
process.destroy();
if (process.isAlive()) {
process.destroyForcibly();
}
}
}
@Override
public Instant getWatermark()
{
return checkpointMark.oldestMessageTimestamp;
}
@Override
public UnboundedSource.CheckpointMark getCheckpointMark()
{
return checkpointMark;
}
@Override
public String getCurrent()
{
if (current == null) {
throw new NoSuchElementException();
}
return current;
}
@Override
public Instant getCurrentTimestamp()
{
if (current == null) {
throw new NoSuchElementException();
}
return currentTimestamp;
}
@Override
public UnboundedPingSource getCurrentSource()
{
return source;
}
}
public static class WindowsRead extends PingCmd.Read {
private final PingArguments pingConfig;
private WindowsRead(PingArguments pingConfig)
{
this.pingConfig = pingConfig;
}
public Builder builder()
{
return new WindowsRead.Builder(this);
}
PingArguments pingConfiguration()
{
return pingConfig;
}
public WindowsRead withPingArguments(PingArguments configuration)
{
checkArgument(configuration != null, "configuration can not be null");
return builder().setPingArguments(configuration).build();
}
@Override
public PCollection<String> expand(PBegin input)
{
org.apache.beam.sdk.io.Read.Unbounded<String> unbounded =
org.apache.beam.sdk.io.Read.from(new UnboundedPingSource(this));
return input.getPipeline().apply(unbounded);
}
@Override
public void populateDisplayData(DisplayData.Builder builder)
{
super.populateDisplayData(builder);
pingConfiguration().populateDisplayData(builder);
}
static class Builder {
private PingArguments config;
Builder()
{
}
private Builder(WindowsRead source)
{
this.config = source.pingConfiguration();
}
WindowsRead.Builder setPingArguments(PingArguments config)
{
this.config = config;
return this;
}
WindowsRead build()
{
return new WindowsRead(this.config);
}
}
@Override
public int hashCode()
{
return Objects.hash(pingConfig);
}
}
我在您的代码中注意到的一件事是advance()
总是返回True
。 水印仅在捆绑完成时前进,我认为如果advance
永远不会返回False
,跑步者是否会完成捆绑取决于跑步者。 您可以尝试在有限的时间/ping 次数后返回False
。
您也可以考虑将其重写为SDF 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.