[英]Spark : Applying map function on Dataset<T> in java
I have below code which works fine with foreach
function on Dataset. 我有下面的代码,可以很好地与数据集上的foreach
函数配合使用。 finalJoined
is a DataFrame
. finalJoined
是一个DataFrame
。
KieServices ks = KieServices.Factory.get();
KieContainer kContainer = ks.getKieClasspathContainer();
ClassTag<KieBase> classTagTest = scala.reflect.ClassTag$.MODULE$.apply(KieBase.class);
Broadcast<KieBase> broadcastRules = context.broadcast(kContainer.getKieBase("rules"), classTagTest);
Encoder<RuleParams> encoder = Encoders.bean(RuleParams.class);
Dataset<RuleParams> ds = new Dataset<RuleParams>(sparkSession, finalJoined.logicalPlan(), encoder);
System.out.println("Printing ruleParams DS");
ds.show();
ds.foreach(ruleParam -> droolprocess(broadcastRules.value(), ruleParam));
Here foreach
method returns void . 这里的foreach
方法返回void 。
I need Dataset<RuleParams>
as return value . 我需要Dataset<RuleParams>
作为返回值。 below is my droolprocess method which calls rule engine and updates RuleParams objects. 下面是我的droolprocess方法,它调用规则引擎并更新RuleParams对象。
public static void droolprocess(KieBase base, RuleParams ruleParams) {
StatelessKieSession session = base.newStatelessKieSession();
session.execute(CommandFactory.newInsert(ruleParams));
System.out.println("After firing rules");
System.out.println(ruleParams.getPriceItemParam1());
System.out.println(ruleParams.getCisDivision());
}
I have seen some questions on stackoverflow and elsewhere but I am not sure how to write map
function instead of foreach
to return Dataset<RuleParams>
我已经在stackoverflow和其他地方看到了一些问题,但是我不确定如何编写map
函数而不是foreach
返回Dataset<RuleParams>
Can anyone help here? 有人可以帮忙吗?
You can use like below: 您可以如下使用:
Dataset<RuleParams> ds = new Dataset<RuleParams>(sparkSession, finalJoined.logicalPlan(), encoder);
StructType schema = ds.schema();
ds = ds.map(ruleParams -> {
RuleParams theRuleParams= ruleParams;
...//your processing
return theRuleParams;
}, RowEncoder.apply(schema));
Once mapping is done you need to return row by creating each of the row if you adding/deleting and modifying data in each row. 映射完成后,如果在每行中添加/删除和修改数据,则需要通过创建每一行来返回行。 Finally apply back the schema so that the dataset knows the schema that it will be returning after performing the map
operation. 最后,应用回架构,以便数据集知道执行map
操作后将要返回的架构。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.