简体   繁体   English

如何在python apache beam中的窗口中订购元素?

[英]How can I order elements in a window in python apache beam?

I noticed that java apache beam has class groupby.sortbytimestamp does python have that feature implemented yet? 我注意到java apache beam有类groupby.sortbytimestamp python是否已实现该功能? If not what would be the way to sort elements in a window? 如果不是在窗口中对元素进行排序的方法是什么? I figure I could sort the entire window in a DoFn, but I would like to know if there is a better way. 我想我可以在DoFn中对整个窗口进行排序,但我想知道是否有更好的方法。

There is not currently built-in value sorting in Beam (in either Python or Java). Beam目前没有内置的值排序(Python或Java)。 Right now, the best option is to sort the values yourself in a DoFn like you mentioned. 现在,最好的选择是在你提到的DoFn中自己对值进行排序。

Here's a solution using a CombineFn. 这是使用CombineFn的解决方案。 It has the added bonus of deduplicating data using the TreeSet. 它还有使用TreeSet对数据进行重复数据删除的额外好处。 You also should make sure your data for a window is small enough to fit in memory on a single worker. 您还应该确保窗口的数据足够小,以适应单个工作程序的内存。

public static class DedupAndSortByTime extends Combine.CombineFn<MarketData, TreeSet<MarketData>, List<MarketData>> {
@Override
public TreeSet<MarketData> createAccumulator() {
    return new TreeSet<>(Comparator
            .comparingLong(MarketData::getEventTime)
            .thenComparing(MarketData::getOrderbookType));
}

@Override
public TreeSet<MarketData> addInput(TreeSet<MarketData> accum, MarketData input) {
    accum.add(input);
    return accum;
}

@Override
public TreeSet<MarketData> mergeAccumulators(Iterable<TreeSet<MarketData>> accums) {

    TreeSet<MarketData> merged = createAccumulator();
    for (TreeSet<MarketData> accum : accums) {
        merged.addAll(accum);
    }
    return merged;
}

@Override
public List<MarketData> extractOutput(TreeSet<MarketData> accum) {
    return Lists.newArrayList(accum.iterator());
}

} }

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Python 中使用 apache beam Pipeline 处理异常? - How can I handle an exception using apache beam Pipeline in Python? 在python apache beam中,是否可以按特定顺序编写元素? - In python apache beam, is it possible to write elements in a specific order? 如何在Apache Beam Python中获取窗口时间戳的结尾 - How to get the end of window timestamp in Apache Beam Python 如何从PCollection Apache Beam Python创建N个元素组 - How to create groups of N elements from a PCollection Apache Beam Python Apache Beam-Python:如何通过累积获取PCollection的前10个元素? - Apache Beam - Python : How to get the top 10 elements of a PCollection with Accumulation? 如何在 Apache Beam 中拆分 json 个元素的文件 - How do I split a file of json elements in Apache Beam 如何为 Python 中的 Apache Beam 制作有用的侧面输入? AsDict object 不可下标? - How do I make a useful side input I can access for Apache Beam in Python? AsDict object not subscriptable? 我可以使用 python 对 Apache beam PCollection 中的项目进行排序吗? - Can I sort the items in an Apache beam PCollection using python? 如何在 Apache 光束中对时间 window 内的元素进行计数,并在计数达到某个阈值时发出数据? - How to count elements inside a time window in Apache beam, and emit the data when the count reach some threshold? 如何计算Apache Beam中PCollection的元素数量 - How to calculate the number of elements of a PCollection in Apache beam
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM