简体繁体 English

是否有任何类似的方法可以在 pyspark 中复制 pandas 的“qcut”function？

[英]Is there any similar way to replicate “qcut” function of pandas in pyspark?

原文 2020-05-19 08:15:27 7 1 pyspark/ apache-spark-sql/ statistics/ data-science/ kolmogorov-smirnov

I wanted to do the KS test in the pyspark for the predicted probability and true labels.我想在 pyspark 中对预测概率和真实标签进行 KS 测试。 The similar work has been done in the pandas in the link: https://www.listendata.com/2019/07/KS-Statistics-Python.html类似的工作已经在链接中的pandas中完成： https://www.listendata.com/2019/07/KS-Statistics-Python.ZFC35FDC70D5FC69D239883A822C7A

1 个解决方案

No there is no direct way.不，没有直接的方法。 You have to apply window functions etc. I have always convert to pandas when I needed this:-) Or when I am working in a Databricks type environment I leverage spark sql.你必须应用 window 函数等。当我需要这个时，我总是转换为 pandas :-) 或者当我在 Databricks 类型的环境中工作时，我利用 spark sql。 I have found these easier than the windowing methods.我发现这些比窗口方法更容易。

将 function 从 pandas 复制到 pyspark - Replicate a function from pandas into pyspark

以与 pyspark 类似的方式在 pandas 中分配一个新列 - Assign a new column in pandas in a similar way as in pyspark

如何在 PySpark 中复制 Pandas 的 between_time function - How to replicate the between_time function of Pandas in PySpark

PySpark 中是否有等价于 Pandas 聚合函数 any() 的函数？ - Is there function in PySpark that is equivalent to Pandas aggregate function any()?

pyspark 数据框中是否有类似于 pandas.io.json.json_normalize 的函数 - Is there a function in pyspark dataframe that is similar to pandas.io.json.json_normalize

有没有办法通过 lambda function 在 pyspark Z3A43B4F883225D94022CEFA9Z - Is there a way to group by lambda function in pyspark pandas

使用 udf 在 Function 中传递两个日期在 pyspark 中出现 df.show() 错误（类似于在熊猫中应用 function） - Passing two date in a Function using udf getting error of df.show() in pyspark (similar to apply function in pandas)

Pandas 到 pyspark cumprod 功能 - Pandas to pyspark cumprod function

如何 Pivot pyspark 中的多列类似于 pandas - How to Pivot multiple columns in pyspark similar to pandas

在 PySpark Pandas UDF 中指定用户定义的 Function 的正确方法 - Correct Way to Specify User-Defined Function in PySpark Pandas UDF

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 function 从 pandas 复制到 pyspark - Replicate a function from pandas into pyspark 以与 pyspark 类似的方式在 pandas 中分配一个新列 - Assign a new column in pandas in a similar way as in pyspark 如何在 PySpark 中复制 Pandas 的 between_time function - How to replicate the between_time function of Pandas in PySpark PySpark 中是否有等价于 Pandas 聚合函数 any() 的函数？ - Is there function in PySpark that is equivalent to Pandas aggregate function any()? pyspark 数据框中是否有类似于 pandas.io.json.json_normalize 的函数 - Is there a function in pyspark dataframe that is similar to pandas.io.json.json_normalize 有没有办法通过 lambda function 在 pyspark Z3A43B4F883225D94022CEFA9Z - Is there a way to group by lambda function in pyspark pandas 使用 udf 在 Function 中传递两个日期在 pyspark 中出现 df.show() 错误（类似于在熊猫中应用 function） - Passing two date in a Function using udf getting error of df.show() in pyspark (similar to apply function in pandas) Pandas 到 pyspark cumprod 功能 - Pandas to pyspark cumprod function 如何 Pivot pyspark 中的多列类似于 pandas - How to Pivot multiple columns in pyspark similar to pandas 在 PySpark Pandas UDF 中指定用户定义的 Function 的正确方法 - Correct Way to Specify User-Defined Function in PySpark Pandas UDF

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM