簡體   English   中英

Apache 光束多個 output 為單個字典

[英]Apache beam multiple output for a single dictionary

我編寫了這個 function 以便能夠從一個字典創建 5 個字典並將其傳遞到 Mapin apache 光束以產生另一個 Pcollection。

輸入: col1,Col2, Col3, Market_0_30, DealerMake_0_30, Market_31_60, DealerMake_31_60, Market_61_90, DealerMake_61_90, Market_91_120, DealerMake_91_120, Market_121, DealerMake_121,

Output:第 1 行:col1、col2、Col3、Market、DealerMake、年齡:0_30

第 2 行:col1、col2、Col3、Market、DealerMake,年齡:31_60

第 3 行:col1、col2、Col3、Market、DealerMake,年齡:31_60

        def _expand(element: Dict) -> List:
        common_columns = {}
        for key in element.keys():
            if key not in markets and key not in dealermakers:
                common_columns[key] = element[key]

        lines = {}
        for i, (market, dealermaker) in enumerate(zip(markets, dealermakers)):
            line = {}
            line = common_columns.copy()
            line[market] = element[market]
            line[dealermaker] = element[dealermaker]
        return lines
    output = sources_data["group_stocks_view"] | "EXPAND" >> beam.Map(_expand) | "PRINT" >> beam.Map(print)

但我最后總是得到一個空的 Pcollection。

請問有什么幫助嗎?

問候,

            def __init__(self):
            pass

        def process(self, element, *args, **kwargs) -> List[Dict[Any, Any]]:
            """convert an element to multiple elements
            Attributes:
                line: element to convert and filter
            Yields:
                yield a json document from the input line if not filtered
            """
            periods = ["0_30", "31_60", "61_90", "91_120", "121"]
            dicts_to_ret = []
            for period in periods:
                clean_dict = {
                    k: v
                    for (k, v) in element.items()
                    if not (k.startswith("Market") or k.startswith("DealerMake"))
                }
                new_dict = {
                    "Market": element[f"Market_{period}"],
                    "DealerMake": element[f"DealerMake_{period}"],
                    "Age": period,
                }
                dicts_to_ret.append({**clean_dict, **new_dict})
                print(dicts_to_ret)
            return dicts_to_ret

    output = sources_data["group_stocks"] | "EXPAND" >> beam.ParDo(ExpandStocks())

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM