![](/img/trans.png)
[英]Convert DataFrame with url in string format to JSON properly
[英]Convert Dataframe Dataset<Row> to JSON Format of String Data Type for particular Columns and convert the JSON String back to Dataframe
我有一个 dataframe。 我需要为每条记录调用 Rest API。
假设 Dataframe 看起来像:
|----|-------------|-----|---------|
|UUID|PID |DEVID|FIRSTNAME|
|----|-------------|-----|---------|
|1111|1234567891011|ABC11|JOHN |
|2222|9876543256827|ABC22|HARRY |
|----|-------------|-----|---------|
The JSON request string for first row should look like(Note: the json is created on 2 columns, not all), as the Rest API to be called requires the input in this format:
{"applicationInfo": {"appId": "ec78fef4-92b9-3b1b-a68d-c45376b6977a"}, "requestData": [{"secureData": "JOHN", "secureDataType": "FIRSTNAME", "index": 1 }, {"secureData": "1234567891011", "secureDataType": "PID", "index": 2 } ] }
索引键的值必须动态生成,对每一行使用增量计数器。
然后,我需要调用 Rest API 发送上述 JSON 作为字符串参数。
加密后 API 的响应如下所示:
{"responseData":[{"resultCode":"00","secureData":"63ygdydshbhgvdyw3et7edgu","secureDataType":"FIRSTNAME","index":1},{"resultCode":"00","secureData":"HKJJBJHVHG66456456FXXFFCGF","secureDataType":"PID","index":2}],"responseCode":"00","responseMessage":"SUCCESS","resultCounts":{"totalCount":2,"successCount":2,"failedCount":0}}
然后我需要阅读上面的响应并创建一个 dataframe 应该如下所示:
|----|--------------------------|-----|------------------------|
|UUID|PID |DEVID|FIRSTNAME |
|----|--------------------------|-----|------------------------|
|1111|HKJJBJHVHG66456456FXXFFCGF|ABC11|63ygdydshbhgvdyw3et7edgu|
|----|--------------------------|-----|------------------------|
如果我将初始输入 dataframe 转换为 JSON().collectAsList(),那么它看起来像:
[{"UUID":"1111","PID":"1234567891011","DEVID":"ABC11","FIRSTNAME":"JOHN"}, {"UUID":"2222","PID":"9876543256827","DEVID":"ABC22","FIRSTNAME":"HARRY"}]
但这不起作用,因为 Rest API 需要以某种格式输入,如上所述。 请帮忙。
对于上述情况,我假设数据集已经按 Spark 工作人员的数量进行了分区,并且它是 Row(数据帧)的通用数据集,那么可以采用以下机制。
注意:您提供的 JSON 结构似乎不正确,JSON 数组是 [{},{},{}]
在您的情况下,给定请求 JSON 的格式,行的直接转换将不起作用,如第 1 点中所述,创建一组 model 类,您可以考虑以下 Z20F35E630DAF44DBDFA4C3F68F539 类。
package org.test.json;
import java.util.List;
public class RequestModel {
protected ApplicationInfo applicationInfo;
protected List<RequestData> requestData;
public ApplicationInfo getApplicationInfo() {return applicationInfo;}
public void setApplicationInfo(ApplicationInfo applicationInfo) {this.applicationInfo = applicationInfo;}
public List<RequestData> getRequestData() {return requestData;}
public void setRequestData(List<RequestData> requestData) {this.requestData = requestData;}
}//class closing
package org.test.json;
public class ApplicationInfo {
protected String appId;
public String getAppId() {return appId;}
public void setAppId(String appId) {this.appId = appId;}
}//class closing
package org.test.json;
public class RequestData {
protected String secureData;
protected String secureDataType;
protected int index;
public String getSecureData() {return secureData;}
public void setSecureData(String secureData) {this.secureData = secureData;}
public String getSecureDataType() {return secureDataType;}
public void setSecureDataType(String secureDataType) {this.secureDataType = secureDataType;}
public int getIndex() {return index;}
public void setIndex(int index) {this.index = index;}
}//class closing
处理从数据帧中获得的列表并填充 model 类,然后使用 Jackson 进行转换以获取请求 JSON。
下面应该做你要找的,不要直接运行这个,数据集是null
//Do not run this, will generate NullPointer, for example only
Dataset<Row> ds=null;
List<Row> rows=ds.collectAsList();
RequestModel request=new RequestModel();
//Set application id
ApplicationInfo appInfo=new ApplicationInfo();
appInfo.setAppId("some id");
request.setApplicationInfo(appInfo);
List<RequestData> reqData=new ArrayList<>();
for(int i=0;i<rows.size();i++) {
//Incrementally generated for each row
int index=i;
Row r=rows.get(i);
int rowLength=r.size();
for(int j=0;j<rowLength;j++) {
RequestData dataElement=new RequestData();
dataElement.setIndex(index);
switch(j) {
case 1:{dataElement.setSecureData(r.getString(j));dataElement.setSecureDataType("PID");break;}
case 3:{dataElement.setSecureDataType(r.getString(j));dataElement.setSecureDataType("FIRSTNAME");break;}
default:{break;}
}//switch closing
reqData.add(dataElement);
}//for closing
}//for closing
我更新了我的代码以更正 for 循环。 现在它给出了正确的结果。
但是如何展平响应字符串并从 ResponseModel obj 中提取 PID 和 FIRSTNAME 值。
List<Row> list = df.collectAsList();
List<Row> responseList = new ArrayList<>();
for(Row r: list) {
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.enable(SerializationFeature.INDENT_OUTPUT);
String responseStr = "{\"responseData\":[{\"resultCode\":\"00\",\"secureData\":\"63ygdydshbhgvdyw3et7edgu\",\"secureDataType\":\"FIRSTNAME\",\"index\":1},{\"resultCode\":\"00\",\"secureData\":\"HKJJBJHVHG66456456FXXFFCGF\",\"secureDataType\":\"PID\",\"index\":2}],\"responseCode\":\"00\",\"responseMessage\":\"SUCCESS\",\"resultCounts\":{\"totalCount\":2,\"successCount\":2,\"failedCount\":0}}";
ResponseModel responseModel = objectMapper.readValue(responseStr, ResponseModel.class);
responseList.add(RowFactory.create((String) r.getAs("UUID"),(String) r.getAs("DEVID")));
Dataset<Row> test= spark.createDataFrame(responseList,schema);
}
出于测试目的,我在循环中硬编码了响应字符串。
如何提取 PID 和 FIRSTNAME 的值并将其添加到上述 responseList 以创建数据帧(UUID、PID、DEVID、FIRSTNAME)。 这里,
ResponseModel class has- ResultCounts, List<ResponseData>, String responseCode, String responseMessage
ResponseData class has- String resultCode, String secureData, String secureDataType, int index
请帮忙
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.