![](/img/trans.png)
[英]Convert DataFrame with url in string format to JSON properly
[英]Convert Dataframe Dataset<Row> to JSON Format of String Data Type for particular Columns and convert the JSON String back to Dataframe
我有一個 dataframe。 我需要為每條記錄調用 Rest API。
假設 Dataframe 看起來像:
|----|-------------|-----|---------|
|UUID|PID |DEVID|FIRSTNAME|
|----|-------------|-----|---------|
|1111|1234567891011|ABC11|JOHN |
|2222|9876543256827|ABC22|HARRY |
|----|-------------|-----|---------|
The JSON request string for first row should look like(Note: the json is created on 2 columns, not all), as the Rest API to be called requires the input in this format:
{"applicationInfo": {"appId": "ec78fef4-92b9-3b1b-a68d-c45376b6977a"}, "requestData": [{"secureData": "JOHN", "secureDataType": "FIRSTNAME", "index": 1 }, {"secureData": "1234567891011", "secureDataType": "PID", "index": 2 } ] }
索引鍵的值必須動態生成,對每一行使用增量計數器。
然后,我需要調用 Rest API 發送上述 JSON 作為字符串參數。
加密后 API 的響應如下所示:
{"responseData":[{"resultCode":"00","secureData":"63ygdydshbhgvdyw3et7edgu","secureDataType":"FIRSTNAME","index":1},{"resultCode":"00","secureData":"HKJJBJHVHG66456456FXXFFCGF","secureDataType":"PID","index":2}],"responseCode":"00","responseMessage":"SUCCESS","resultCounts":{"totalCount":2,"successCount":2,"failedCount":0}}
然后我需要閱讀上面的響應並創建一個 dataframe 應該如下所示:
|----|--------------------------|-----|------------------------|
|UUID|PID |DEVID|FIRSTNAME |
|----|--------------------------|-----|------------------------|
|1111|HKJJBJHVHG66456456FXXFFCGF|ABC11|63ygdydshbhgvdyw3et7edgu|
|----|--------------------------|-----|------------------------|
如果我將初始輸入 dataframe 轉換為 JSON().collectAsList(),那么它看起來像:
[{"UUID":"1111","PID":"1234567891011","DEVID":"ABC11","FIRSTNAME":"JOHN"}, {"UUID":"2222","PID":"9876543256827","DEVID":"ABC22","FIRSTNAME":"HARRY"}]
但這不起作用,因為 Rest API 需要以某種格式輸入,如上所述。 請幫忙。
對於上述情況,我假設數據集已經按 Spark 工作人員的數量進行了分區,並且它是 Row(數據幀)的通用數據集,那么可以采用以下機制。
注意:您提供的 JSON 結構似乎不正確,JSON 數組是 [{},{},{}]
在您的情況下,給定請求 JSON 的格式,行的直接轉換將不起作用,如第 1 點中所述,創建一組 model 類,您可以考慮以下 Z20F35E630DAF44DBDFA4C3F68F539 類。
package org.test.json;
import java.util.List;
public class RequestModel {
protected ApplicationInfo applicationInfo;
protected List<RequestData> requestData;
public ApplicationInfo getApplicationInfo() {return applicationInfo;}
public void setApplicationInfo(ApplicationInfo applicationInfo) {this.applicationInfo = applicationInfo;}
public List<RequestData> getRequestData() {return requestData;}
public void setRequestData(List<RequestData> requestData) {this.requestData = requestData;}
}//class closing
package org.test.json;
public class ApplicationInfo {
protected String appId;
public String getAppId() {return appId;}
public void setAppId(String appId) {this.appId = appId;}
}//class closing
package org.test.json;
public class RequestData {
protected String secureData;
protected String secureDataType;
protected int index;
public String getSecureData() {return secureData;}
public void setSecureData(String secureData) {this.secureData = secureData;}
public String getSecureDataType() {return secureDataType;}
public void setSecureDataType(String secureDataType) {this.secureDataType = secureDataType;}
public int getIndex() {return index;}
public void setIndex(int index) {this.index = index;}
}//class closing
處理從數據幀中獲得的列表並填充 model 類,然后使用 Jackson 進行轉換以獲取請求 JSON。
下面應該做你要找的,不要直接運行這個,數據集是null
//Do not run this, will generate NullPointer, for example only
Dataset<Row> ds=null;
List<Row> rows=ds.collectAsList();
RequestModel request=new RequestModel();
//Set application id
ApplicationInfo appInfo=new ApplicationInfo();
appInfo.setAppId("some id");
request.setApplicationInfo(appInfo);
List<RequestData> reqData=new ArrayList<>();
for(int i=0;i<rows.size();i++) {
//Incrementally generated for each row
int index=i;
Row r=rows.get(i);
int rowLength=r.size();
for(int j=0;j<rowLength;j++) {
RequestData dataElement=new RequestData();
dataElement.setIndex(index);
switch(j) {
case 1:{dataElement.setSecureData(r.getString(j));dataElement.setSecureDataType("PID");break;}
case 3:{dataElement.setSecureDataType(r.getString(j));dataElement.setSecureDataType("FIRSTNAME");break;}
default:{break;}
}//switch closing
reqData.add(dataElement);
}//for closing
}//for closing
我更新了我的代碼以更正 for 循環。 現在它給出了正確的結果。
但是如何展平響應字符串並從 ResponseModel obj 中提取 PID 和 FIRSTNAME 值。
List<Row> list = df.collectAsList();
List<Row> responseList = new ArrayList<>();
for(Row r: list) {
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.enable(SerializationFeature.INDENT_OUTPUT);
String responseStr = "{\"responseData\":[{\"resultCode\":\"00\",\"secureData\":\"63ygdydshbhgvdyw3et7edgu\",\"secureDataType\":\"FIRSTNAME\",\"index\":1},{\"resultCode\":\"00\",\"secureData\":\"HKJJBJHVHG66456456FXXFFCGF\",\"secureDataType\":\"PID\",\"index\":2}],\"responseCode\":\"00\",\"responseMessage\":\"SUCCESS\",\"resultCounts\":{\"totalCount\":2,\"successCount\":2,\"failedCount\":0}}";
ResponseModel responseModel = objectMapper.readValue(responseStr, ResponseModel.class);
responseList.add(RowFactory.create((String) r.getAs("UUID"),(String) r.getAs("DEVID")));
Dataset<Row> test= spark.createDataFrame(responseList,schema);
}
出於測試目的,我在循環中硬編碼了響應字符串。
如何提取 PID 和 FIRSTNAME 的值並將其添加到上述 responseList 以創建數據幀(UUID、PID、DEVID、FIRSTNAME)。 這里,
ResponseModel class has- ResultCounts, List<ResponseData>, String responseCode, String responseMessage
ResponseData class has- String resultCode, String secureData, String secureDataType, int index
請幫忙
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.