[英]gcp dataflow process element does not go to the next ParDO function
*我用良好的數據填充地圖(沒有空值),但是我無法繼續使用下一個ParDo函數。我嘗試調試,但不知道為什么會發生。 如果有人知道我在做什么錯,請告訴我。我正在設置三個ParDo函數。謝謝*
.apply("Parse XML CarrierManifest ", ParDo.of(new DoFn<String, Manifest>() {
@ProcessElement
public void processeElement(ProcessContext c) {
try {
System.out.println(c.element());
JAXBContext jaxbContext = JAXBContext.newInstance(Manifest.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
StringReader reader = new StringReader(c.element());
Manifest manifest = (Manifest) unmarshaller.unmarshal(reader);
if (manifest == null) throw new RuntimeException("Invalid data");
c.output(manifest);
}
catch (Exception e)
{
LOG.error("Unexpected error while parsing input. File was <[ " + c.element() + " ]>", e);
}
}
}
)
)
//---------------------------------------------------------------------------------------------------------------
.apply("preparing data " , ParDo.of(new DoFn<Manifest, Map<String, List<TableRow>>>()
{
@ProcessElement
public void processeElement(ProcessContext c)
{
Map<String, List<TableRow>> RowsTable = new ArrayMap<>();
RowsTable.put("Manifest",new ArrayList<>());
Manifest manifest = c.element();
Links linkss = manifest.linkes;
System.out.println(linkss.ShipmentsList.linakageShipment.linkageesList.size());
for (int i = 0; i < linkss.ShipmentsList.linakageShipment.linkageesList.size(); i++) {
RowsTable.get("Manifest")
.add(new TableRow()
.set("GROUP_ID", manifest.GroupidValue)
.set("STATUS", manifest.StatusValue)
.set("GROUP_TYPE", manifest.typeValue)
.set("CREATED_AT", manifest.created_atValue)
.set("READY_AT", manifest.ready_atValue)
.set("MANIFEST_NUMBER", manifest.manifest_numberValue)
.set("LINKS_SELF", linkss.SelfLink)
.set("SHIPMENT_ID", linkss.ShipmentsList.linakageShipment.linkageesList.get(i).ID)
.set("SHIPMENT_TYPE", linkss.ShipmentsList.linakageShipment.linkageesList.get(i).Type));
}
c.output(RowsTable);
}
}))
//---------------------------------------------------------------------------------------------------------------
.apply("change rows list to one row ",ParDo.of(new DoFn<Map<String, List<TableRow>>, TableRow>()
{
@ProcessElement
public void processElement(ProcessContext c)
{
System.out.println("id: " + c.element());
for (TableRow r : c.element().get("Manifest")) // Should only have 1
c.output(r);
}
}))
從您對問題的評論之一中,我了解到問題在於,您的Dataflow管道僅在使用Dataflow本身(使用Dataflow Runner
)運行時才起作用,但是當您使用Direct Runner
時,它在本地不起作用。
如Apache Beam的Direct Runner文檔中所述 ,本地執行受到本地可用內存的限制,建議您使用可由本地計算機處理的小型數據集來完成調試過程。 無論如何,根據相同的評論,我知道您的管道在Dataflow中執行時工作良好,因此管道本身沒有問題。
根據您提供的描述,該問題肯定與Direct Runner
的局限性有關,但是如果您在本地/遠程環境中遇到更具體的錯誤,則問題說明和對評論的回答應更具體要求提供有關您的使用問題的更多信息,以便為您提供幫助。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.