简体   繁体   English

pyspark 的用户定义函数 (UDF) 是否需要单元测试?

[英]Does user-defined functions (UDF) for pyspark need unit test?

I'm new to pysaprk, so I have a function and I've written unit test for it, and I have defined a UDF function by using this function for pyspark, something like: I'm new to pysaprk, so I have a function and I've written unit test for it, and I have defined a UDF function by using this function for pyspark, something like:

udf_my_function = udf(lambda s: my_function(s), StringType())

My question is if I already have the unit test for my_function() , do I need a unit test for udf_my_function ?我的问题是,如果我已经对my_function()进行了单元测试,是否需要对udf_my_function进行单元测试? If so, how can I write it?如果是这样,我该怎么写? Any relevant articles or links will also be appreciated?任何相关的文章或链接也将不胜感激? Many thanks.非常感谢。

from my personal opinion, it's not strictly necessary.从我个人看来,这不是绝对必要的。 But sometimes it's still desirable to have the test as part of the testing suite that is doing data transformations.但有时仍然希望将测试作为进行数据转换的测试套件的一部分。 Usually it will have form of:通常它将具有以下形式:

sourceDf = .... # read data from somewhere, or define in test
resultDf = sourceDf.withColumn("result", udf_my_function(col("some_column")))
assertEqual(resultDf, expectedDf)

There are several libraries available for writing unit tests for PySpark:有几个库可用于为 PySpark 编写单元测试:

you can also use pytest-spark to simplify the maintenance of the Spark parameters, include 3rd-party packages, etc.您还可以使用pytest-spark来简化 Spark 参数的维护,包括 3rd-party 包等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM