簡體   English   中英

pyspark udf用於mutils列

[英]pyspark udf for mutils columns

我有一個數據框

import pandas as pd

ndf = pd.DataFrame({'a':[False, False,True,True,False], 'b':[False, False,False,False, True]})

ndf_s = sqlContext.createDataFrame(ndf)

我想獲得一個名為“ action”的新列。 它可以包含兩個值,如果ndf ['a']為True,則“動作”的值為“我是a”,如果ndf ['b']為True,則“動作”的值為“我是b”。 。 否則獲取值None。 如果兩列都為真,則返回值為“我是a和b”。換句話說,我想獲取一個DataFrame為:

ndf_result = sqlContext.createDataFrame(pd.DataFrame({'a':[False, False,True,True,False], 'b':[False, False,False,False, True], 'action':[None, None, 'I am a', 'I am a', 'I am b']}))

您可以使用when.otherwise

import pyspark.sql.functions as F

ndf_s.withColumn("action", F.when(
        ndf_s["a"] & ndf_s["b"], "I am a and b"
    ).otherwise(
        F.when(
            ndf_s["a"], "I am a"
        ).otherwise(
            F.when(ndf_s["b"], "I am b")
        )
    )
).show()
+-----+-----+------------+
|    a|    b|      action|
+-----+-----+------------+
| true| true|I am a and b|
|false|false|        null| 
| true|false|      I am a|
| true|false|      I am a|
|false| true|      I am b|
+-----+-----+------------+

udf另一個選擇:

import pyspark.sql.functions as F

@F.udf
def action(col_a, col_b):
    if col_a and col_b:
        return "I am a and b"
    elif col_a:
        return "I am a"
    elif col_b:
        return "I am b"

ndf_s.withColumn("action", action(ndf_s["a"], ndf_s["b"])).show()
+-----+-----+------------+  
|    a|    b|      action|
+-----+-----+------------+
| true| true|I am a and b|
|false|false|        null|
| true|false|      I am a|
| true|false|      I am a|
|false| true|      I am b|
+-----+-----+------------+
import pyspark.sql.functions as udf
import pandas as pd

ndf = pd.DataFrame({'a':[False, False,True,True,False], 'b':[False, False,False,False, True]})

ndf_s = sqlContext.createDataFrame(ndf)


def get_expected_string(a,b):
    if a and b:
       return "I am a and b"
    elif a:
       return "I am a"
    elif b:
       return "I am b"
    else: return None

# defining udf function for get_expected_string
get_expected_string_udf = udf(get_expected_string, StringType())

ndf_s = ndf_s.withColumn("action",get_expected_string_udf("a","b"))

ndf_s.show()
+-----+-----+------------+  
|    a|    b|      action|
+-----+-----+------------+
| true| true|I am a and b|
|false|false|        null|
| true|false|      I am a|
| true|false|      I am a|
|false| true|      I am b|
+-----+-----+------------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM