在数据框列中选择非重复值

Question

I have the following dataframe. 我有以下数据框。

import pandas as pd
dates = pd.date_range('20130101', periods=10)
df = pd.DataFrame([1,1,1,-1,-1,-1,1,1,-1,1], index=dates, columns=list('A'))

Expected output from df df的预期输出

df_out=pd.DataFrame([1,0,0,-1,0,0,1,0,-1,1], index=dates, columns=list('A'))

I want to choose alternate +1 and -1 and substitute zero when there is repetition. 我想选择交替的+1和-1，并在有重复时替换为零。

df can be a big dataframe of 10 columns and I want this conversion on all the columns. df可以是10列的大数据框，我希望所有列都进行此转换。 What is the effective way without using for loop? 不使用for循环的有效方法是什么？ Please suggest the way forward. 请提出前进的方向。 Thanking in anticipation. 感谢期待。

Answer 1

Try using np.where() : 尝试使用np.where() ：

df.A=np.where(df.A.ne(df.A.shift()),df.A,0)
print(df)

            A
2013-01-01  1
2013-01-02  0
2013-01-03  0
2013-01-04 -1
2013-01-05  0
2013-01-06  0
2013-01-07  1
2013-01-08  0
2013-01-09 -1
2013-01-10  1

Answer 2

IIUC you could use Series.diff along with ne to check which first differences are not 0 , or in other words, which subsequent values are not repeated, and replace those that are False with 0 using DataFrame.where : IIUC，您可以使用Series.diff和ne来检查哪些第一个差异不为0 ，换句话说，哪些后续值不重复，并使用DataFrame.where将那些False替换为0 ：

df.where(df.A.diff().ne(0), 0)

            A
2013-01-01  1
2013-01-02  0
2013-01-03  0
2013-01-04 -1
2013-01-05  0
2013-01-06  0
2013-01-07  1
2013-01-08  0
2013-01-09 -1
2013-01-10  1

Answer 3

Try: 尝试：

df['A'] = df['A'] * (df['A'].diff() != 0)

How this works: 工作原理：

diff() calculates the difference between successive values in your series. diff()计算系列中连续值之间的差。 If the diff is 0 then we know there was a repetition. 如果diff为0，那么我们知道有重复。

Therefore we can do a != 0 check which will create a boolean series which will be True wherever there was no repetition and false where there was a repetition. 因此，我们可以执行!= 0检查，这将创建一个布尔序列，在没有重复的地方为True，在没有重复的地方为false。

Boolean series can be used as a series of zeroes and ones and multiplied against the original series resulting in zeroing out all the repetitions. 布尔序列可以用作零和一的序列，并与原始序列相乘，从而将所有重复归零。

Answer 4

A third option: 第三种选择：

import pandas as pd
import numpy as np

def check_dup(data):
    print(data)
    if data[0] == data[1]:
        return 0
    else:
        return data[1]

df = pd.DataFrame(np.random.randint(0,2, (10,1))*2-1)

df.rolling(window=2).apply(check_dup, raw=True)

在数据框列中选择非重复值

问题描述

4 个解决方案

解决方案1
2 2019-03-22 11:56:59

解决方案2
2 已采纳 2019-03-22 11:57:03

解决方案3
2 2019-03-22 12:01:55

解决方案4
0 2019-03-22 12:03:12

在数据框列中选择非重复值

问题描述

4 个解决方案

解决方案1 2 2019-03-22 11:56:59

解决方案2 2 已采纳 2019-03-22 11:57:03

解决方案3 2 2019-03-22 12:01:55

解决方案4 0 2019-03-22 12:03:12

解决方案1
2 2019-03-22 11:56:59

解决方案2
2 已采纳 2019-03-22 11:57:03

解决方案3
2 2019-03-22 12:01:55

解决方案4
0 2019-03-22 12:03:12