简体   繁体   English

如何在结构化流中保留和重置状态?

[英]How to persist and reset state in Structured Streaming?

I have a requirement where I need to show the aggregated count of a specific category starting from the onset of current day till current time. 我有一个需要显示从今天开始到当前时间的特定类别的合计计数的要求。

I am using Structure Streaming to do the grouping. 我正在使用结构流来进行分组。 As window does not persist state of a dataframe, I am not sure how to implement the logic where I can persist its state and increment the counter on previous state. 由于窗口不能持久保存数据帧的状态,因此我不确定如何实现可以持久保存其状态并在先前状态增加计数器的逻辑。 Also how will I reset the state on the onset of a new day. 另外,我将如何在新的一天开始时重置状态。

Input Record: 输入记录:

{"Floor_Id" : "Shop Floor 1",
"HaltRecord" : {
    "HaltReason" : "Danahydraulic Error",
    "Severity" : "Low",
    "FaultErrorCategory" : "Docked",
    "NonFaultErrorCategory" : null
},
"Description" : "Forklift",
"Category" : {
    "Type" : "Halt",
    "End_time" : NumberLong(2018-02-13T12:00:01),
    "Start_time" : NumberLong(2018-02-13T12:00:00)
},
"Asset_Id" : 123,
"isError" : "y",
"Timestamp": 2018-02-13T12:00:01}

Output Response: 输出响应:

{
    "Floor_Id": "Shop Floor 1",
    "Error_Category": [
        {
            "Category": "Operator Error",
            "DataPoints": 
                {
                    "NumberOfErrors": 20,
                    "Date": 2018-02-13
                }
        },
        {
            "Category": "Danahydraulic Error",
            "DataPoints": {
                    "NumberOfErrors": 15,
                    "Date": 2018-02-13
                }
        }
    ]
}

Any help is much appreciated. 任何帮助深表感谢。

我没有使用过结构化流的状态函数,但我知道它是mapGroupWithState函数,可以提供持久状态和计数逻辑的功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我们如何在 Spark 结构化流 2.4.4 中缓存/持久化数据集 - How do we cache / persist dataset in spark structured streaming 2.4.4 如何将 Spark 结构化流数据重置为最后可用的偏移量 - How to reset spark structured streaming data to last available offset Spark结构化的流状态管理 - Spark structured streaming state management 使用 State (Pyspark) 的 Spark 结构化流 - Spark Structured Streaming with State (Pyspark) 如何使结构化流中的dropDuplicates状态到期以避免OOM? - How to expire state of dropDuplicates in structured streaming to avoid OOM? Spark Structured Streaming foreachBatch 和 UPSERT(合并):坚持还是不坚持? - Spark Structured Streaming foreachBatch and UPSERT (merge): to persist or not to persist? 当状态数据增长时,Spark Structured Streaming如何处理内存状态? - How does Spark Structured Streaming handle in-memory state when state data is growing? Spark Structured Streaming state 管理与 RocksDB - Spark Structured Streaming state management with RocksDB 当不再检查状态数据时,Spark Structured Streaming如何刷新内存状态? - How does Spark Structured Streaming flush in-memory state when state data is no longer being checked? mapGroupsWithState 的 Spark 结构化流 state 存储在哪里? - Where is Spark structured streaming state of mapGroupsWithState stored?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM