简体   繁体   English

如何在pySpark数据框中添加一个新列,该列包含计数大于0的列值?

[英]How to add a new column to pySpark dataframe which contains count its column values which are greater to 0?

I want to add a new column to pyspark dataframe which contains count of all columns values which are greater to 0 in a particular row. 我想向pyspark数据框添加一个新列,其中包含在特定行中大于0的所有列值的计数。

Here is my demo dataframe. 这是我的演示数据框。

+-----------+----+----+----+----+----+----+
|customer_id|2010|2011|2012|2013|2014|2015|
+-----------+----+----+----+----+----+----+
|     1     |  0 |  4 |  0 | 32 |  0 | 87 |
|     2     |  5 |  5 | 56 | 23 |  0 | 09 |
|     3     |  6 |  6 | 87 |  0 | 45 | 23 |
|     4     |  7 |  0 | 12 | 89 | 78 | 0  |
|     6     |  0 |  0 |  0 | 23 | 45 | 64 |
+-----------+----+----+----+----+----+----+

Above data frame have visit by a customer in a year. 以上数据框架一年内被客户拜访。 I want to count how many years a customer visited. 我想计算一个客户拜访了多少年。 So i need a column visit_count which is having count of visits in year (2010,2011,2012,2013,2014,2015) having value greater to 0. 所以我需要一列visit_count ,该列的访问量在年份(2010,2011,2012,2013,2014,2015)中大于0。

+-----------+----+----+----+----+----+----+-----------+
|customer_id|2010|2011|2012|2013|2014|2015|visit_count|
+-----------+----+----+----+----+----+----+-----------+
|     1     |  0 |  4 |  0 | 32 |  0 | 87 |    3      |
|     2     |  5 |  5 | 56 | 23 |  0 | 09 |    5      |
|     3     |  6 |  6 | 87 |  0 | 45 | 23 |    5      |
|     4     |  7 |  0 | 12 | 89 | 78 | 0  |    4      |
|     6     |  0 |  0 |  0 | 23 | 45 | 64 |    3      |
+-----------+----+----+----+----+----+----+-----------+

How to Achieve this? 如何做到这一点?

尝试这个:

df.withColumn('visit_count', sum((df[col] > 0).cast('integer') for col in df.columns))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 向数据帧添加新列,这是groupby计数的结果 - Add a new column to a dataframe which is the result of a groupby count 向 dataframe 添加一个新列,这将指示另一列是否包含单词 pyspark - add a new column to a dataframe that will indicate if another column contains a word pyspark 如何将包含np数组的2列连接到数据框中的新列 - How to concatenate 2 columns,which contains np arrays, into a new column in dataframe Python Pandas Dataframe:基于现有列添加新列,其中包含列表列表 - Python Pandas Dataframe: add new column based on existing column, which contains lists of lists 如何将新元素添加到列表的pandas.DataFrame列? - How to add new element to pandas.DataFrame column which is list? 如何在 pyspark 中添加新列并根据其他列放置其值? - how to add new column in pyspark and put its values based on other column? 如何在同一数据框中的另一列中查找包含唯一值的列值? - How to find column values which contains unique value in another column from same dataframe? 在 PySpark 数据框中添加列总和作为新列 - Add column sum as new column in PySpark dataframe 如何将列添加到包含列值在熊猫中对应的组的名称的 DataFrame - How to add a column to a DataFrame that contains the name of the group to which a column value corresponds in pandas 如何在 Pyspark 中处理包含 SQL 逻辑的 Table.column - How to process a Table.column which contains a SQL logic in Pyspark
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM