简体   繁体   中英

Convert Dataframe Dataset<Row> to JSON Format of String Data Type for particular Columns and convert the JSON String back to Dataframe

I have a dataframe. I need to call a Rest API for each record.

Lets say the Dataframe looks like:

|----|-------------|-----|---------|
|UUID|PID          |DEVID|FIRSTNAME|
|----|-------------|-----|---------|
|1111|1234567891011|ABC11|JOHN     |
|2222|9876543256827|ABC22|HARRY    |
|----|-------------|-----|---------|

The JSON request string for first row should look like(Note: the json is created on 2 columns, not all), as the Rest API to be called requires the input in this format:

{"applicationInfo": {"appId": "ec78fef4-92b9-3b1b-a68d-c45376b6977a"}, "requestData": [{"secureData": "JOHN", "secureDataType": "FIRSTNAME", "index": 1 }, {"secureData": "1234567891011", "secureDataType": "PID", "index": 2 } ] }

The value of index key has to be generted on the fly, using an incremental counter for each row.

Then, i need to call the Rest API sending the above JSON as a string param.

The response from the API after encryption will look like:

{"responseData":[{"resultCode":"00","secureData":"63ygdydshbhgvdyw3et7edgu","secureDataType":"FIRSTNAME","index":1},{"resultCode":"00","secureData":"HKJJBJHVHG66456456FXXFFCGF","secureDataType":"PID","index":2}],"responseCode":"00","responseMessage":"SUCCESS","resultCounts":{"totalCount":2,"successCount":2,"failedCount":0}}

Then I need to read the above response and create a dataframe which should look like:

|----|--------------------------|-----|------------------------|
|UUID|PID                       |DEVID|FIRSTNAME               |
|----|--------------------------|-----|------------------------|
|1111|HKJJBJHVHG66456456FXXFFCGF|ABC11|63ygdydshbhgvdyw3et7edgu|
|----|--------------------------|-----|------------------------|

If i convert the initial input dataframe toJSON().collectAsList(), then it looks like:

[{"UUID":"1111","PID":"1234567891011","DEVID":"ABC11","FIRSTNAME":"JOHN"}, {"UUID":"2222","PID":"9876543256827","DEVID":"ABC22","FIRSTNAME":"HARRY"}]

But this doesnt work as the Rest API requires its input in a certain format, mentioned above. Please help.

For the above, I assume that the data set has been partitioned across the number of Spark workers and it is a generic data set of Row (data frame), then the below mechanism can be employed.

  1. Define a class with the required attributes as a data container
  2. Take the data set content as a List (takeAsList method if data set, refer )
  3. Create and populate the objects of your data container (and store in such a way to identify them later, you shall have to repopulate them with decrypted data)
  4. Serialize the list into a JSON array with Jackson ( refer ) Step 4 & 5 can be combined with Jackson custom serializer refer example
  5. Make the REST call and repopulate the data container objects (after deserializing the response with Jackson)
  6. Create a data frame ( an example )
  7. Process the data frame (dataset of rows)

NOTE: The JSON structure you have provided seems not to be correct, JSON array is [{},{},{}]


In your case, given the format of the request JSON, direct conversion of rows will not work, as mentioned in point 1, make a set of model classes, you could consider the below model classes.

package org.test.json;

import java.util.List;

public class RequestModel {

protected ApplicationInfo applicationInfo;
protected List<RequestData> requestData;

public ApplicationInfo getApplicationInfo() {return applicationInfo;}
public void setApplicationInfo(ApplicationInfo applicationInfo) {this.applicationInfo = applicationInfo;}

public List<RequestData> getRequestData() {return requestData;}
public void setRequestData(List<RequestData> requestData) {this.requestData = requestData;}

}//class closing




package org.test.json;

public class ApplicationInfo {

protected String appId;

public String getAppId() {return appId;}
public void setAppId(String appId) {this.appId = appId;}

}//class closing




package org.test.json;

public class RequestData {

protected String secureData;
protected String secureDataType;
protected int index;

public String getSecureData() {return secureData;}
public void setSecureData(String secureData) {this.secureData = secureData;}

public String getSecureDataType() {return secureDataType;}
public void setSecureDataType(String secureDataType) {this.secureDataType = secureDataType;}

public int getIndex() {return index;}
public void setIndex(int index) {this.index = index;}

}//class closing

Process the list as obtained from the data frame and populate the model classes and then convert with Jackson to get the request JSON.


The below should do what you are looking for, don't directly run this, the data set is null

        //Do not run this, will generate NullPointer, for example only
    Dataset<Row> ds=null;
    List<Row> rows=ds.collectAsList();

    RequestModel request=new RequestModel();

    //Set application id
    ApplicationInfo appInfo=new ApplicationInfo();
    appInfo.setAppId("some id");
    request.setApplicationInfo(appInfo);

    List<RequestData> reqData=new ArrayList<>();
    for(int i=0;i<rows.size();i++) {

        //Incrementally generated for each row
        int index=i;

        Row r=rows.get(i);
        int rowLength=r.size();

        for(int j=0;j<rowLength;j++) {

            RequestData dataElement=new RequestData();
            dataElement.setIndex(index);

            switch(j) {

                case 1:{dataElement.setSecureData(r.getString(j));dataElement.setSecureDataType("PID");break;}
                case 3:{dataElement.setSecureDataType(r.getString(j));dataElement.setSecureDataType("FIRSTNAME");break;}
                default:{break;}

            }//switch closing

            reqData.add(dataElement);

        }//for closing

    }//for closing

I updated my code to correct the for loop. Now its giving correct result.

But how to flatten the response string and extract the PID and FIRSTNAME values from the ResponseModel obj.

        List<Row> list = df.collectAsList();
        List<Row> responseList = new ArrayList<>();

            for(Row r: list) {
                            ObjectMapper objectMapper = new ObjectMapper();
                objectMapper.enable(SerializationFeature.INDENT_OUTPUT);

String responseStr = "{\"responseData\":[{\"resultCode\":\"00\",\"secureData\":\"63ygdydshbhgvdyw3et7edgu\",\"secureDataType\":\"FIRSTNAME\",\"index\":1},{\"resultCode\":\"00\",\"secureData\":\"HKJJBJHVHG66456456FXXFFCGF\",\"secureDataType\":\"PID\",\"index\":2}],\"responseCode\":\"00\",\"responseMessage\":\"SUCCESS\",\"resultCounts\":{\"totalCount\":2,\"successCount\":2,\"failedCount\":0}}";
                ResponseModel responseModel = objectMapper.readValue(responseStr, ResponseModel.class);
               responseList.add(RowFactory.create((String) r.getAs("UUID"),(String) r.getAs("DEVID")));
            Dataset<Row> test= spark.createDataFrame(responseList,schema);

}

For testing purpose, i have hardcoded the response string inside the loop.

How to extract and add the value of PID and FIRSTNAME to the above responseList to create a dataframe(UUID, PID, DEVID, FIRSTNAME). Here,

ResponseModel class has- ResultCounts, List<ResponseData>, String responseCode, String responseMessage
ResponseData class has- String resultCode, String secureData, String secureDataType, int index

Please help

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM