简体   繁体   中英

Spark : Applying map function on Dataset<T> in java

I have below code which works fine with foreach function on Dataset. finalJoined is a DataFrame .

    KieServices ks = KieServices.Factory.get();
    KieContainer kContainer = ks.getKieClasspathContainer();
    ClassTag<KieBase> classTagTest =  scala.reflect.ClassTag$.MODULE$.apply(KieBase.class);
    Broadcast<KieBase> broadcastRules = context.broadcast(kContainer.getKieBase("rules"), classTagTest);


    Encoder<RuleParams> encoder = Encoders.bean(RuleParams.class);
        Dataset<RuleParams> ds = new Dataset<RuleParams>(sparkSession, finalJoined.logicalPlan(), encoder);
        System.out.println("Printing ruleParams DS");
        ds.show();
        ds.foreach(ruleParam -> droolprocess(broadcastRules.value(), ruleParam));

Here foreach method returns void .

I need Dataset<RuleParams> as return value . below is my droolprocess method which calls rule engine and updates RuleParams objects.

public static void droolprocess(KieBase base, RuleParams ruleParams) {
        StatelessKieSession session = base.newStatelessKieSession();
session.execute(CommandFactory.newInsert(ruleParams));
        System.out.println("After firing  rules");
        System.out.println(ruleParams.getPriceItemParam1());
        System.out.println(ruleParams.getCisDivision());
         }

I have seen some questions on stackoverflow and elsewhere but I am not sure how to write map function instead of foreach to return Dataset<RuleParams>

Can anyone help here?

You can use like below:

 Dataset<RuleParams> ds = new Dataset<RuleParams>(sparkSession, finalJoined.logicalPlan(), encoder);
    StructType schema = ds.schema();
    ds = ds.map(ruleParams -> {

RuleParams theRuleParams= ruleParams;

    ...//your processing
    return theRuleParams;
    }, RowEncoder.apply(schema));

Once mapping is done you need to return row by creating each of the row if you adding/deleting and modifying data in each row. Finally apply back the schema so that the dataset knows the schema that it will be returning after performing the map operation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM