繁体   English   中英

Apache 光束多个 output 为单个字典

[英]Apache beam multiple output for a single dictionary

我编写了这个 function 以便能够从一个字典创建 5 个字典并将其传递到 Mapin apache 光束以产生另一个 Pcollection。

输入: col1,Col2, Col3, Market_0_30, DealerMake_0_30, Market_31_60, DealerMake_31_60, Market_61_90, DealerMake_61_90, Market_91_120, DealerMake_91_120, Market_121, DealerMake_121,

Output:第 1 行:col1、col2、Col3、Market、DealerMake、年龄:0_30

第 2 行:col1、col2、Col3、Market、DealerMake,年龄:31_60

第 3 行:col1、col2、Col3、Market、DealerMake,年龄:31_60

        def _expand(element: Dict) -> List:
        common_columns = {}
        for key in element.keys():
            if key not in markets and key not in dealermakers:
                common_columns[key] = element[key]

        lines = {}
        for i, (market, dealermaker) in enumerate(zip(markets, dealermakers)):
            line = {}
            line = common_columns.copy()
            line[market] = element[market]
            line[dealermaker] = element[dealermaker]
        return lines
    output = sources_data["group_stocks_view"] | "EXPAND" >> beam.Map(_expand) | "PRINT" >> beam.Map(print)

但我最后总是得到一个空的 Pcollection。

请问有什么帮助吗?

问候,

            def __init__(self):
            pass

        def process(self, element, *args, **kwargs) -> List[Dict[Any, Any]]:
            """convert an element to multiple elements
            Attributes:
                line: element to convert and filter
            Yields:
                yield a json document from the input line if not filtered
            """
            periods = ["0_30", "31_60", "61_90", "91_120", "121"]
            dicts_to_ret = []
            for period in periods:
                clean_dict = {
                    k: v
                    for (k, v) in element.items()
                    if not (k.startswith("Market") or k.startswith("DealerMake"))
                }
                new_dict = {
                    "Market": element[f"Market_{period}"],
                    "DealerMake": element[f"DealerMake_{period}"],
                    "Age": period,
                }
                dicts_to_ret.append({**clean_dict, **new_dict})
                print(dicts_to_ret)
            return dicts_to_ret

    output = sources_data["group_stocks"] | "EXPAND" >> beam.ParDo(ExpandStocks())

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM