[英]How to persist and reset state in Structured Streaming?
I have a requirement where I need to show the aggregated count of a specific category starting from the onset of current day till current time. 我有一个需要显示从今天开始到当前时间的特定类别的合计计数的要求。
I am using Structure Streaming to do the grouping. 我正在使用结构流来进行分组。 As window does not persist state of a dataframe, I am not sure how to implement the logic where I can persist its state and increment the counter on previous state.
由于窗口不能持久保存数据帧的状态,因此我不确定如何实现可以持久保存其状态并在先前状态增加计数器的逻辑。 Also how will I reset the state on the onset of a new day.
另外,我将如何在新的一天开始时重置状态。
Input Record: 输入记录:
{"Floor_Id" : "Shop Floor 1",
"HaltRecord" : {
"HaltReason" : "Danahydraulic Error",
"Severity" : "Low",
"FaultErrorCategory" : "Docked",
"NonFaultErrorCategory" : null
},
"Description" : "Forklift",
"Category" : {
"Type" : "Halt",
"End_time" : NumberLong(2018-02-13T12:00:01),
"Start_time" : NumberLong(2018-02-13T12:00:00)
},
"Asset_Id" : 123,
"isError" : "y",
"Timestamp": 2018-02-13T12:00:01}
Output Response: 输出响应:
{
"Floor_Id": "Shop Floor 1",
"Error_Category": [
{
"Category": "Operator Error",
"DataPoints":
{
"NumberOfErrors": 20,
"Date": 2018-02-13
}
},
{
"Category": "Danahydraulic Error",
"DataPoints": {
"NumberOfErrors": 15,
"Date": 2018-02-13
}
}
]
}
Any help is much appreciated. 任何帮助深表感谢。
我没有使用过结构化流的状态函数,但我知道它是mapGroupWithState函数,可以提供持久状态和计数逻辑的功能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.