简体   繁体   English

在数据框列中选择非重复值

[英]Choosing non repetitive values in dataframe columns

I have the following dataframe. 我有以下数据框。

import pandas as pd
dates = pd.date_range('20130101', periods=10)
df = pd.DataFrame([1,1,1,-1,-1,-1,1,1,-1,1], index=dates, columns=list('A'))

Expected output from df df的预期输出

df_out=pd.DataFrame([1,0,0,-1,0,0,1,0,-1,1], index=dates, columns=list('A'))

I want to choose alternate +1 and -1 and substitute zero when there is repetition. 我想选择交替的+1和-1,并在有重复时替换为零。

df can be a big dataframe of 10 columns and I want this conversion on all the columns. df可以是10列的大数据框,我希望所有列都进行此转换。 What is the effective way without using for loop? 不使用for循环的有效方法是什么? Please suggest the way forward. 请提出前进的方向。 Thanking in anticipation. 感谢期待。

Try using np.where() : 尝试使用np.where()

df.A=np.where(df.A.ne(df.A.shift()),df.A,0)
print(df)

            A
2013-01-01  1
2013-01-02  0
2013-01-03  0
2013-01-04 -1
2013-01-05  0
2013-01-06  0
2013-01-07  1
2013-01-08  0
2013-01-09 -1
2013-01-10  1

IIUC you could use Series.diff along with ne to check which first differences are not 0 , or in other words, which subsequent values are not repeated, and replace those that are False with 0 using DataFrame.where : IIUC,您可以使用Series.diffne来检查哪些第一个差异不为0 ,换句话说,哪些后续值不重复,并使用DataFrame.where将那些False替换为0

df.where(df.A.diff().ne(0), 0)

            A
2013-01-01  1
2013-01-02  0
2013-01-03  0
2013-01-04 -1
2013-01-05  0
2013-01-06  0
2013-01-07  1
2013-01-08  0
2013-01-09 -1
2013-01-10  1

Try: 尝试:

df['A'] = df['A'] * (df['A'].diff() != 0)

How this works: 工作原理:

diff() calculates the difference between successive values in your series. diff()计算系列中连续值之间的差。 If the diff is 0 then we know there was a repetition. 如果diff为0,那么我们知道有重复。

Therefore we can do a != 0 check which will create a boolean series which will be True wherever there was no repetition and false where there was a repetition. 因此,我们可以执行!= 0检查,这将创建一个布尔序列,在没有重复的地方为True,在没有重复的地方为false。

Boolean series can be used as a series of zeroes and ones and multiplied against the original series resulting in zeroing out all the repetitions. 布尔序列可以用作零和一的序列,并与原始序列相乘,从而将所有重复归零。

A third option: 第三种选择:

import pandas as pd
import numpy as np

def check_dup(data):
    print(data)
    if data[0] == data[1]:
        return 0
    else:
        return data[1]

df = pd.DataFrame(np.random.randint(0,2, (10,1))*2-1)

df.rolling(window=2).apply(check_dup, raw=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM