繁体 English 中英

如何从pyspark的数据框中获取满足条件的列？

[英]How could I get columns that meet a condition from a dataframe in pyspark?

原文 2017-05-23 13:57:00 6 1 python/ sql/ filter/ attributes/ pyspark

我有一个具有不同列（或属性）的数据框，并且我想获得另一个仅包含具有6个以上不同值的列的数据框。

我怎么能得到？

1 个解决方案

以下代码段即可满足您的要求。 样本数据集具有三列（col1，col2，col3）。 col3仅具有一个唯一值3，而col1和col2具有6个不同值。 最终数据帧只有co11和col2。

df = spark.createDataFrame([(1,2,3),(10,20,3),(20,40,3),(40,50,3),(50,60,3),(60,70,3)],['col1','col2','col3'])
columns = [ column for column in df.columns if len(df.select(column).distinct().collect()) >= 6 ]
>>> df.select(columns).show()
+----+----+
|col1|col2|
+----+----+
|   1|   2|
|  10|  20|
|  20|  40|
|  40|  50|
|  50|  60|
|  60|  70|
+----+----+

如何将所有 dataframe 列过滤到 Pyspark 中的条件？

[英]How to filter all dataframe columns to an condition in Pyspark?

如果满足条件，如何在数据框中将两列添加或组合成另一列

[英]How to add or combine two columns into another one in a dataframe if they meet a condition

如何找到pandas dataframe中满足条件的数据的索引和列？

[英]How to find the index and columns of data in pandas dataframe that meet the condition?

我如何比较 PySpark 中另一个 dataframe 的列

[英]How i can compare columns from another dataframe in PySpark

如果某些行部分满足某些条件，如何从 dataframe 中删除某些行

[英]How to drop certain rows from dataframe if they partially meet certain condition

如何获取 PySpark DataFrame 的引用列？

[英]How to get referenced columns of a PySpark DataFrame?

如何计算循环数据帧中满足条件的值？

[英]How to count values that meet a condition in a loop dataframe?

在有条件的情况下在Pyspark数据框中转置多列

[英]Transpose mutiple columns in a Pyspark dataframe with a condition

从包括地图列的数据框中获取列的总和 - PySpark

[英]Get sum of columns from a dataframe including map column - PySpark

如何从 Pyspark 中的 DataFrame 中获取这种子集？

[英]How to get this kind of subset from a DataFrame in Pyspark?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将所有 dataframe 列过滤到 Pyspark 中的条件？如果满足条件，如何在数据框中将两列添加或组合成另一列如何找到pandas dataframe中满足条件的数据的索引和列？我如何比较 PySpark 中另一个 dataframe 的列如果某些行部分满足某些条件，如何从 dataframe 中删除某些行如何获取 PySpark DataFrame 的引用列？如何计算循环数据帧中满足条件的值？在有条件的情况下在Pyspark数据框中转置多列从包括地图列的数据框中获取列的总和 - PySpark 如何从 Pyspark 中的 DataFrame 中获取这种子集？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM