简体   繁体   English

如何在Python中检查UDF函数中pyspark dataframe列的单元格值是none还是NaN以实现前向填充?

[英]How to check in Python if cell value of pyspark dataframe column in UDF function is none or NaN for implementing forward fill?

I am basically trying to do a forward fill imputation. 我基本上是在尝试进行正向填充估算。 Below is the code for that. 下面是该代码。

df = spark.createDataFrame([(1,1, None), (1,2, 5), (1,3, None), (1,4, None), (1,5, 10), (1,6, None)], ('session',"timestamp", "id"))

PRV_RANK = 0.0
def fun(rank):
    ########How to check if None or Nan?  ###############
    if rank is None or rank is NaN:
        return PRV_RANK
    else:
        PRV_RANK = rank
        return rank        

fuN= F.udf(fun, IntegerType())

df.withColumn("ffill_new", fuN(df["id"])).show()

I am getting weird error in the log. 我在日志中收到奇怪的错误。

Edit: The question is related to how to identify null & nan in spark dataframe using python. 编辑:问题与如何使用python识别spark数据框中的null和nan有关。

Edit: I am assuming the below line of code which checks for NaN & Null is causing the issue. 编辑:我假设下面的代码行检查NaN和Null导致此问题。 So I have given the title accordingly for this question. 因此,我已为该问题给出了相应的标题。

Traceback (most recent call last): 追溯(最近一次通话):

File "", line 1, in df_na.withColumn("ffill_new", forwardFill(df_na["id"])).show() df_na.withColumn(“ ffill_new”,forwardFill(df_na [“ id”]))。show()中的文件“”,第1行

File "C:\\Spark\\python\\pyspark\\sql\\dataframe.py", line 318, in show print(self._jdf.showString(n, 20)) 文件``C:\\ Spark \\ python \\ pyspark \\ sql \\ dataframe.py'',行318,在show print(self._jdf.showString(n,20))中

File "C:\\Spark\\python\\lib\\py4j-0.10.4-src.zip\\py4j\\java_gateway.py", line 1133, in call answer, self.gateway_client, self.target_id, self.name) 呼叫应答中的文件“ C:\\ Spark \\ python \\ lib \\ py4j-0.10.4-src.zip \\ py4j \\ java_gateway.py”第1133行,self.gateway_client,self.target_id,self.name)

File "C:\\Spark\\python\\pyspark\\sql\\utils.py", line 63, in deco return f(*a, **kw) 文件“ C:\\ Spark \\ python \\ pyspark \\ sql \\ utils.py”,第63行,以装饰性返回f(* a,** kw)

File "C:\\Spark\\python\\lib\\py4j-0.10.4-src.zip\\py4j\\protocol.py", line 319, in get_return_value format(target_id, ".", name), value) 文件“ C:\\ Spark \\ python \\ lib \\ py4j-0.10.4-src.zip \\ py4j \\ protocol.py”,第319行,以get_return_value格式(target_id,“。”,名称),值)

Py4JJavaError: An error occurred while calling o806.showString. Py4JJavaError:调用o806.showString时发生错误。 : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 47.0 failed 1 times, most recent failure: Lost task 0.0 in stage 47.0 (TID 83, localhost, executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "C:\\Spark\\python\\lib\\pyspark.zip\\pyspark\\worker.py", line 174, in main File "C:\\Spark\\python\\lib\\pyspark.zip\\pyspark\\worker.py", line 169, in process File "C:\\Spark\\python\\lib\\pyspark.zip\\pyspark\\worker.py", line 106, in File "C:\\Spark\\python\\lib\\pyspark.zip\\pyspark\\worker.py", line 92, in File "C:\\Spark\\python\\lib\\pyspark.zip\\pyspark\\worker.py", line 70, in File "", line 5, in forwardfil UnboundLocalError: local variable 'PRV_RANK' referenced before assignment :org.apache.spark.SparkException:由于阶段失败导致作业中止:阶段47.0中的任务0失败1次,最近一次失败:阶段47.0中丢失了任务0.0(TID 83,本地主机,执行程序驱动程序):org.apache.spark .api.python.PythonException:追溯(最近一次调用为上):文件“ C:\\ Spark \\ python \\ lib \\ pyspark.zip \\ pyspark \\ worker.py”,行174,在主文件“ C:\\ Spark \\ python”中\\ lib \\ pyspark.zip \\ pyspark \\ worker.py”,行169,正在处理文件“ C:\\ Spark \\ python \\ lib \\ pyspark.zip \\ pyspark \\ worker.py”,行106,文件“ C:\\”文件“ C:\\ Spark \\ python \\ lib \\ pyspark.zip \\ pyspark \\ worker.py”第70行,文件“”中的Spark \\ python \\ lib \\ pyspark.zip \\ pyspark \\ worker.py”,第92行,在forwardfil UnboundLocalError中的第5行:分配前引用了本地变量'PRV_RANK'

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193) at org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:234) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) at org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$doExecute$1.apply(BatchEvalPythonExec.scala:144) at org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$doExecute$1.apply(BatchEvalPythonExec.scala:87) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.sp 在org.apache.spark.api.python.PythonRunner $$ anon $ 1.read(PythonRDD.scala:193)在org.apache.spark.api.python.PythonRunner $$ anon $ 1。(PythonRDD.scala:234)在org.apache.spark.sql.execution.python.BatchEvalPythonExec $$ anonfun $ doExecute $ 1.apply(BatchEvalPythonExec.scala:144)上的org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) org.apache.spark.sql.execution.python.BatchEvalPythonExec $$ anonfun $ doExecute $ 1.apply(BatchEvalPythonExec.scala:87)at org.apache.spark.rdd.RDD $$ anonfun $ mapPartitions $ 1 $$ anonfun $ apply $ 23 org.apache.spark.rdd的.apply(RDD.scala:797).org.apache.spark.rdd.MapPartitionsRDD的.apply(RDD.scala:797)在org.apache.spark.rdd.RDD $$ anonfun $ mapPartitions $ 1 $$ anonfun $ apply $ 23.apply(RDD.scala:797) org.apache.spark.rdd.RDP.compute(MapPartitionsRDD.scala:38)org.org.apache.spark.rdd.RDD.iterator(RDD.scala:287)的org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)。 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)位于org.apache.sp处的apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala)上的org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)上的ark.rdd.RDD.iterator(RDD.scala:287): 323),位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87),位于org.apache.spark.scheduler.ResultTask(RDD.scala:287),位于org.apache.spark.scheduler.Task在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:322)在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)在.run(Task.scala:99) util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)在java.lang.Thread.run(Thread.java:748)

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler. 驱动程序堆栈跟踪:org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1在org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ failJobAndIndependentStages(DAGScheduler.scala:1435)在org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.apply(DAGScheduler.scala:1422)处应用(DAGScheduler.scala:1423)在scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala: 59)在org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)在org.apache.spark.scheduler.DAGScheduler $$ org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:802)上的anonfun $ handleTaskSetFailed $ 1.apply(DAGScheduler.scala:802)在scala.Option.foreach(Option.scala:257) ),位于org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)处。 scala:1650) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:333) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788) at org.apache.spark.sql.Dataset.org scala:1650)位于org.apache.spark.org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605),org.org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)位于org.apache.spark。 org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)上的.EventLoop $$ anon $ 1.run(EventLoop.scala:48)在org.apache.spark.SparkContext.runJob(SparkContext.scala:1925) )于org.apache.spark.SparkContext.runa(1938)于org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)于org.apache.spark.sql.execution.SparkPlan.executeTake( org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)的SparkPlan.scala:333),org.apache.spark.sql.Dataset $$ anonfun $ org $ apache $ spark $ sql $ Dataset org.apache.spark.sql.Dataset.withNewExecutionId(Dataset)的$$ execute $ 1 $ 1.apply(Dataset.scala:2386)位于org.apache.spark.sql.execution.SQLExecution $ .withNewExecutionId(SQLExecution.scala:57) .scala:2788),位于org.apache.spark.sql.Dataset.org $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2392) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2128) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2127) at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818) at org.apache.spark.sql.Dataset.head(Dataset.scala:2127) at org.apache.spark.sql.Dataset.take(Dataset.scala:2342) at org.apache.spark.sql.Dataset.showString(Dataset.scala:248) at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j $ apache $ spark $ sql $ Dataset $$ execute $ 1(Dataset.scala:2385)at org.apache.spark.sql.Dataset.org $ apache $ spark $ sql $ Dataset $$ collect(Dataset.scala:2392)at org.apache.spark.sql.Dataset $$ anonfun $ head $ 1.apply(Dataset.scala:2128)位于org.apache.spark.sql.Dataset $$ anonfun $ head $ 1.apply(Dataset.scala:2127)位于org.apache.spark.sql.Dataset.take(org.apache.spark.sql.Dataset.head(Dataset.scala:2127)的org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2818)位于org.apache.spark.sql.Dataset.showString(Dataset.scala:248)的Dataset.scala:2342)位于sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:sun.reflect.GeneratedMethodAccessor35.invoke(未知源)处43)在py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)在py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)在java.lang.reflect.Method.invoke(Method.java:498) py4j.pys的py4j.Gateway.invoke(Gateway.java:280)py4j的py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) .commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "C:\\Spark\\python\\lib\\pyspark.zip\\pyspark\\worker.py", line 174, in main File "C:\\Spark\\python\\lib\\pyspark.zip\\pyspark\\worker.py", line 169, in process File "C:\\Spark\\python\\lib\\pyspark.zip\\pyspark\\worker.py", line 106, in File "C:\\Spark\\python\\lib\\pyspark.zip\\pyspark\\worker.py", line 92, in File "C:\\Spark\\python\\lib\\pyspark.zip\\pyspark\\worker.py", line 70, in File "", line 5, in forwardfil UnboundLocalError: local variable 'PRV_RANK' referenced before assignment py4j.GatewayConnection.run(GatewayConnection.java:214)的.commands.CallCommand.execute(CallCommand.java:79),java.lang.Thread.run(Thread.java:748)的原因:org.apache.spark。 api.python.PythonException:追溯(最近一次通话为最后):文件“ C:\\ Spark \\ python \\ lib \\ pyspark.zip \\ pyspark \\ worker.py”,行174,位于主文件“ C:\\ Spark \\ python \\ lib \\ pyspark.zip \\ pyspark \\ worker.py”,行169,正在处理文件“ C:\\ Spark \\ python \\ lib \\ pyspark.zip \\ pyspark \\ worker.py”,行106,文件“ C:\\ Spark”文件“ C:\\ Spark \\ python \\ lib \\ pyspark.zip \\ pyspark \\ worker.py”第70行,文件“”中的\\ python \\ lib \\ pyspark.zip \\ pyspark \\ worker.py”,第92行,第5行,forwardfil UnboundLocalError:赋值之前引用了本地变量'PRV_RANK'

at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193) at org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:234) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) at org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$doExecute$1.apply(BatchEvalPythonExec.scala:144) at org.apache.spark.sql.execution.python.BatchEvalPythonExec$$anonfun$doExecute$1.apply(BatchEvalPythonExec.scala:87) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:797) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.sp 在org.apache.spark.api.python.PythonRunner $$ anon $ 1.read(PythonRDD.scala:193)在org.apache.spark.api.python.PythonRunner $$ anon $ 1。(PythonRDD.scala:234)在org.apache.spark.sql.execution.python.BatchEvalPythonExec $$ anonfun $ doExecute $ 1.apply(BatchEvalPythonExec.scala:144)上的org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152) org.apache.spark.sql.execution.python.BatchEvalPythonExec $$ anonfun $ doExecute $ 1.apply(BatchEvalPythonExec.scala:87)at org.apache.spark.rdd.RDD $$ anonfun $ mapPartitions $ 1 $$ anonfun $ apply $ 23 org.apache.spark.rdd的.apply(RDD.scala:797).org.apache.spark.rdd.MapPartitionsRDD的.apply(RDD.scala:797)在org.apache.spark.rdd.RDD $$ anonfun $ mapPartitions $ 1 $$ anonfun $ apply $ 23.apply(RDD.scala:797) org.apache.spark.rdd.RDP.compute(MapPartitionsRDD.scala:38)org.org.apache.spark.rdd.RDD.iterator(RDD.scala:287)的org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)。 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)位于org.apache.sp处的apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) ark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 1 more org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala)上的org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)上的ark.rdd.RDD.iterator(RDD.scala:287): 323),位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87),位于org.apache.spark.scheduler.ResultTask(RDD.scala:287),位于org.apache.spark.scheduler.Task在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:322)在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)在.run(Task.scala:99) util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)另外1个

df.withColumn("ffill_new", f.UserDefinedFunction(lambda x: x or 0, IntegerType())(df["id"])).show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何对具有单列的PySpark数据框进行正向填充缺失值估算? - How do forward fill missing value imputation for a PySpark dataframe with single column? 如何向前填充 dataframe 列,其中填充的行数限制基于另一列中单元格的值? - How can I forward fill a dataframe column where the limit of rows filled is based on the value of a cell in another column? 如何用条件列均值填充数据框的空/nan 单元格 - How to fill dataframe's empty/nan cell with conditional column mean Dataframe 上的 Pyspark UDF 列 - Pyspark UDF column on Dataframe 如何将 dataframe 中的缺失值填充为 ffill() function 需要 nan/None 不在我的 dataframe 中? - How fill the missing values in the dataframe as ffill() function needs nan/None which are not in my dataframe? 如何转发/填充 Pandas DataFrame 列/系列中的特定值? - How to forward propagate/fill a specific value in a Pandas DataFrame Column/Series? 使用 None 值过滤 Pyspark dataframe 列 - Filter Pyspark dataframe column with None value 文本列上的Pyspark DataFrame UDF - Pyspark DataFrame UDF on Text Column 如何将Spark Dataframe列的每个值作为字符串传递给python UDF? - How to pass each value of Spark Dataframe column as string to python UDF? PySpark:如何将 Python UDF 应用于 PySpark DataFrame 列? - PySpark: How to apply a Python UDF to PySpark DataFrame columns?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM