In Apache Beam Python SDK, it is possible to perform the following:
input
| GroupBy(account=lambda s: s["account"])
.aggregate_field(lambda x: x["wordsAddup"] - x["wordsSubtract"], sum, 'wordsRead')
How do we perform a similar action in the Java SDK? Strangely, the programming guide has only examples in Python for this transform.
Here is my attempt at producing the equivalent in Java:
input.apply(
Group.byFieldNames("account")
.aggregateField(<INSERT EQUIVALENT HERE>, Sum.ofIntegers(), "wordsRead"));
There are some Java examples at https://beam.apache.org/documentation/programming-guide/#using-schemas . (Note you may have to select the java
tab on a selector that has both Java and Python to see them.)
In Java I don't think the first argument of aggregateField can take an arbitrary expression; it must be a field name. You can proceed the grouping operation with a projection that adds a new field for the desired expression. For example
input
.apply(SqlTransform.query(
"SELECT *, wordsAddup - wordsSubtract AS wordsDiff from PCOLLECTION")
.apply(Group.byFieldNames("account")
.aggregateField("wordsDiff", Sum.ofIntegers(), "wordsRead"));
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.