繁体   English   中英

在 pyspark 中将数组转换为字符串

[英]Convert array to string in pyspark

这是我的实际代码,它工作正常

df_train_taxrate = (
  df_train.groupby(
    'Company_code_BUKRS',
    'Vendor_Customer_Code_WT_ACCO',
    'Expense_GL_HKONT',
    'PAN_J_1IPANNO',
    'HSN_SAC_HSN_SAC'
  ).agg(
    f.collect_set('Section_WT_QSCOD').alias('Unique_Sectio_Code'),
    f.collect_set('WHT_rate_QSATZ').alias('Unique_Wtax_rate')
  )
)

但问题是 'Section_WT_QSCOD,WHT_rate_QSATZ 这些是数组,在将 arrays 转换为字符串时,我遇到了错误。

我的代码:

df_train_taxrate = df_train.groupby(
    'Company_code_BUKRS',
    'Vendor_Customer_Code_WT_ACCO',
    'Expense_GL_HKONT',
    'PAN_J_1IPANNO',
    'HSN_SAC_HSN_SAC'
  ).agg(
    f.collect_set('Section_WT_QSCOD').withColumn(
      'Section_WT_QSCOD',                                           
      concat_ws(',', 'Unique_Sectio_Code')
    ),
    f.collect_set('WHT_rate_QSATZ').withColumn(
      'WHT_rate_QSATZ', 
      concat_ws(',', 'Unique_W_tax_rate')
    )
  )

错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Column' object is not callable

您需要改用array_join

示例数据

import pyspark.sql.functions as F
data = [
    ('a', 'x1'),
    ('a', 'x2'),
    ('a', 'x3'),
    ('b', 'y1'),
    ('b', 'y2')
]
df = spark.createDataFrame(data, ['id', 'val'])

解决方案

result = (
    df.
        groupby('id').
        agg(
            F.collect_set(F.col('val')).alias('arr_of_vals')
        ).
        withColumn(
            'arr_to_string',
            F.array_join(
                F.col('arr_of_vals'),
                ','
            )
        )
)
result
DataFrame[id: string, arr_of_vals: array<string>, arr_to_string: string]
result.show(truncate=False)
+---+------------+-------------+                                                
|id |arr_of_vals |arr_to_string|
+---+------------+-------------+
|b  |[y2, y1]    |y2,y1        |
|a  |[x1, x3, x2]|x1,x3,x2     |
+---+------------+-------------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM