[英]Remove linearly increasing “count” columns pandas
I have a dataframe with some columns representing counts for every timestep, I would like to automatically drop these, for example like the df.dropna()
functionality, but something like df.dropcounts()
. 我有一个数据框,其中的某些列表示每个时间步长的计数,我想自动删除这些计数,例如
df.dropna()
功能,但类似df.dropcounts()
。
Here is an example dataframe 这是一个示例数据框
array = [[0.0,1.6,2.7,12.0],[1.0,3.5,4.5,13.0],[2.0,6.5,8.6,14.0]]
pd.DataFrame(array)
0 1 2 3
0 0.0 1.6 2.7 12.0
1 1.0 3.5 4.5 13.0
2 2.0 6.5 8.6 14.0
I would like to drop the first and last columns 我想删除第一列和最后一列
I believe need: 我相信需要:
val = 1
df = df.loc[:, df.diff().fillna(val).ne(val).any()]
print (df)
1 2
0 1.6 2.7
1 3.5 4.5
2 6.5 8.6
Explanation : 说明 :
First compare by DataFrame.diff
: 首先通过
DataFrame.diff
比较:
print (df.diff())
0 1 2 3
0 NaN NaN NaN NaN
1 1.0 1.9 1.8 1.0
2 1.0 3.0 4.1 1.0
Replace NaN
s: 替换
NaN
:
print (df.diff().fillna(val))
0 1 2 3
0 1.0 1.0 1.0 1.0
1 1.0 1.9 1.8 1.0
2 1.0 3.0 4.1 1.0
Compare if not equal by ne
: 如果不相等比较
ne
:
print (df.diff().fillna(val).ne(val))
0 1 2 3
0 False False False False
1 False True True False
2 False True True False
And chck at least one True
per column by DataFrame.any
: 并且通过
DataFrame.any
每列至少一个True
:
print (df.diff().fillna(val).ne(val).any())
0 False
1 True
2 True
3 False
dtype: bool
Using all
all
使用
d.loc[:,~d.diff().fillna(1).eq(1).all().values]
Out[295]:
1 2
0 1.6 2.7
1 3.5 4.5
2 6.5 8.6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.