等于.apply 从 Pandas 到 PySpark

Question

I have the following dataframe on pyspark我在 pyspark 上有以下 dataframe

+--------------------+-----+
|            activity| diff|
+--------------------+-----+
|      Ajustar nómina|33339|
|Generar archivo p...| 1383|
|Generar archivo p...|  269|
|Contabilizar Nomi...|  561|

and the following function I have made以及我制作的以下 function

def to_date(seconds=0):
'''
:param seconds:
:return:
'''
dat = ''
if seconds == 0:
    return '0 s'
if (seconds / 2678400) >= 1:
    month = round((seconds/2678400), 1)
    # seconds = (seconds - 2678400 * int(seconds / 2678400))
    if month > 1:
        return f'{month} months'
    else:
        return f'{month} month'
if (seconds / 86400) >= 1:
    day = round((seconds / 86400), 1)
    # seconds = (seconds - 86400 * int(seconds / 86400))
    if day > 1:
        return f'{day} days'
    else:
        return f'{day} day'
if (seconds / 3600) >= 1:
    hour = round((seconds / 3600), 1)
    # seconds = (seconds - 3600 * int(seconds / 3600))
    return f'{hour} hr'
if (seconds / 60) >= 1:
    minutes = (int(seconds / 60))
    return f'{minutes} min'
else:
    seconds = int(seconds)
    return f'{seconds} s'
return dat

I would like to know if there is an equal to df.apply(to_date) on PySpark, I would like to achieve applying the function to_date on each row of the PySpark Dataframe df . I would like to know if there is an equal to df.apply(to_date) on PySpark, I would like to achieve applying the function to_date on each row of the PySpark Dataframe df .

Thank you!谢谢！

Answer 1

Answering my own question I understood how to do it:回答我自己的问题，我知道该怎么做：

udf_to_date = F.udf(to_date, StringType())
df = df.withColumn("mean_2", udf_to_date("diff"))

If anyone knows a better solution I'm open to recieve them, Thank you!如果有人知道更好的解决方案，我愿意接受，谢谢！

等于.apply 从 Pandas 到 PySpark

问题描述

1 个解决方案

解决方案1
0 2020-08-06 17:19:34

等于.apply 从 Pandas 到 PySpark

问题描述

1 个解决方案

解决方案1 0 2020-08-06 17:19:34

解决方案1
0 2020-08-06 17:19:34