[英]How to filter pandas or pyspark dataframe values in columns?
I have a dataframe with this values: 我有一个具有以下值的数据框:
+--------+-------+--------------+-----+
|tag_html|tag_css|tag_javascript|count|
+--------+-------+--------------+-----+
| 0.0| 0.0| 0.0| 8655|
| 1.0| 0.0| 0.0| 141|
| 0.0| 0.0| 1.0| 782|
| 1.0| 0.0| 1.0| 107|
| 0.0| 1.0| 0.0| 96|
| 0.0| 1.0| 1.0| 20|
| 1.0| 1.0| 1.0| 46|
| 1.0| 1.0| 0.0| 153|
+--------+-------+--------------+-----+
I want the rows where "1" is not repeated in the other columns like this 我想要这样的行在其他列中不重复“ 1”
+--------+-------+--------------+-----+
|tag_html|tag_css|tag_javascript|count|
+--------+-------+--------------+-----+
| 1.0| 0.0| 0.0| 141|
| 0.0| 0.0| 1.0| 782|
| 0.0| 1.0| 0.0| 96|
what I have done is this, using the function where()
我所做的就是使用
where()
函数
df['count'].where(((asdf['tag_html'] == 1) | (asdf['tag_css'] == 0) | (asdf['tag_javascript'] == 0)) &
((asdf['tag_html'] == 0) | (asdf['tag_css'] == 1) | (asdf['tag_javascript'] == 0)) &
((asdf['tag_html'] == 0) | (asdf['tag_css'] == 0) | (asdf['tag_javascript'] == 1)))
this is the result 这是结果
0 8655.0
1 141.0
2 782.0
3 NaN
4 96.0
5 NaN
6 46.0
7 NaN
Is there a better way to do this in pandas or pyspark? 在pandas或pyspark中有更好的方法吗?
By using mask
and Boolean index 通过使用
mask
和布尔索引
df=df.assign(count=df['count'].mask(df.iloc[:,:3].eq(1).sum(1).gt(1)))
df
Out[513]:
tag_html tag_css tag_javascript count
0 0.0 0.0 0.0 8655.0
1 1.0 0.0 0.0 141.0
2 0.0 0.0 1.0 782.0
3 1.0 0.0 1.0 NaN
4 0.0 1.0 0.0 96.0
5 0.0 1.0 1.0 NaN
6 1.0 1.0 1.0 NaN
7 1.0 1.0 0.0 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.