简体   繁体   English

等于.apply 从 Pandas 到 PySpark

[英]Equal of .apply from Pandas to PySpark

I have the following dataframe on pyspark我在 pyspark 上有以下 dataframe

+--------------------+-----+
|            activity| diff|
+--------------------+-----+
|      Ajustar nómina|33339|
|Generar archivo p...| 1383|
|Generar archivo p...|  269|
|Contabilizar Nomi...|  561|

and the following function I have made以及我制作的以下 function

def to_date(seconds=0):
'''
:param seconds:
:return:
'''
dat = ''
if seconds == 0:
    return '0 s'
if (seconds / 2678400) >= 1:
    month = round((seconds/2678400), 1)
    # seconds = (seconds - 2678400 * int(seconds / 2678400))
    if month > 1:
        return f'{month} months'
    else:
        return f'{month} month'
if (seconds / 86400) >= 1:
    day = round((seconds / 86400), 1)
    # seconds = (seconds - 86400 * int(seconds / 86400))
    if day > 1:
        return f'{day} days'
    else:
        return f'{day} day'
if (seconds / 3600) >= 1:
    hour = round((seconds / 3600), 1)
    # seconds = (seconds - 3600 * int(seconds / 3600))
    return f'{hour} hr'
if (seconds / 60) >= 1:
    minutes = (int(seconds / 60))
    return f'{minutes} min'
else:
    seconds = int(seconds)
    return f'{seconds} s'
return dat

I would like to know if there is an equal to df.apply(to_date) on PySpark, I would like to achieve applying the function to_date on each row of the PySpark Dataframe df . I would like to know if there is an equal to df.apply(to_date) on PySpark, I would like to achieve applying the function to_date on each row of the PySpark Dataframe df .

Thank you!谢谢!

Answering my own question I understood how to do it:回答我自己的问题,我知道该怎么做:

udf_to_date = F.udf(to_date, StringType())
df = df.withColumn("mean_2", udf_to_date("diff")) 

If anyone knows a better solution I'm open to recieve them, Thank you!如果有人知道更好的解决方案,我愿意接受,谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM